トップ 差分 一覧 ソース 検索 ヘルプ PDF RSS ログイン

zabbix housekeeper 障害

  • zabbix-server を起動して30分ぐらいで停止する。
  • zabbix-server 停止直前で housekeeper が起動している。
  • zabbix_server.log
23069:20240405:155505.424 executing housekeeper
23069:20240405:155505.832 Got signal [signal:11(SIGSEGV),reason:1,refaddr:0x0]. Crashing ...
23069:20240405:155505.832 ====== Fatal information: ======
23069:20240405:155505.833 program counter not available for this architecture
23069:20240405:155505.833 === Registers: ===
23069:20240405:155505.833 register dump not available for this architecture
23069:20240405:155505.833 === Backtrace: ===
23069:20240405:155505.833 11: 0x53527b <zbx_backtrace+0x4b> at /usr/local/sbin/zabbix_server
23069:20240405:155505.833 10: 0x535422 <zbx_log_fatal_info+0x92> at /usr/local/sbin/zabbix_server
23069:20240405:155505.833 9: 0x535abb <zbx_set_common_signal_handlers+0x28b> at /usr/local/sbin/zabbix_server
23069:20240405:155505.833 8: 0x826129b6e <pthread_sigmask+0x54e> at /lib/libthr.so.3
23069:20240405:155505.833 7: 0x82612911f <pthread_setschedparam+0x83f> at /lib/libthr.so.3
23069:20240405:155505.833 6: 0x7ffffffff2d3 <???> at ???
23069:20240405:155505.833 5: 0x554b4d <is_uint_n_range+0x2d> at /usr/local/sbin/zabbix_server
23069:20240405:155505.833 4: 0x3b8fb1 <housekeeper_thread+0x4c1> at /usr/local/sbin/zabbix_server
23069:20240405:155505.833 3: 0x53757c <zbx_thread_start+0x2c> at /usr/local/sbin/zabbix_server
23069:20240405:155505.833 2: 0x39350c <MAIN_ZABBIX_ENTRY+0x13dc> at /usr/local/sbin/zabbix_server
23069:20240405:155505.834 1: 0x392813 <MAIN_ZABBIX_ENTRY+0x6e3> at /usr/local/sbin/zabbix_server
23069:20240405:155505.834 0: 0x534c82 <daemon_start+0x1b2> at /usr/local/sbin/zabbix_server
23069:20240405:155505.834 === Memory map: ===
23069:20240405:155505.834 memory map not available for this platform
23069:20240405:155505.834 ================================
23036:20240405:155505.835 One child process died (PID:23069,exitcode/signal:1). Exiting ...
23036:20240405:155505.835 PROCESS EXIT: 23069
23045:20240405:155505.835 HA manager has been paused
zabbix_server [23036]: Error waiting for process with PID 23069: [10] No child processes
23045:20240405:155505.889 HA manager has been stopped
23036:20240405:155505.896 syncing trend data...
23036:20240405:155505.908 syncing trend data done
23036:20240405:155505.909 Zabbix Server stopped. Zabbix 6.0.28 (revision 1f9d541d29a).

原因

  • history テーブルに異常なデータが含まれていた。
zabbix=# select count(*) from history;
  count
---------
 3857659
(1 row)
zabbix=# select count(*) from history where itemid>0;
  count
---------
 3857658
(1 row)
zabbix=# select itemid,count(*) from history group by itemid having count(*)=1;
 itemid | count
--------+-------
        |     1
(1 row)

対応

  • 現行 history を別名に退避し、正常な history を作成、正常データのみをコピーした。
service zabbix6_server stop
  • psql 起動
# su -l pgsql
$ psql zabbix
psql>
  • history テーブルを退避
psql> alter table history rename to history_bak; 
  • history テーブル作成SQLを確認 /usr/local/share/zabbix6/server/database/postgresql/schema.sql
CREATE TABLE history (
       itemid                   bigint                                    NOT NULL,
       clock                    integer         DEFAULT '0'               NOT NULL,
       value                    DOUBLE PRECISION DEFAULT '0.0000'          NOT NULL,
       ns                       integer         DEFAULT '0'               NOT NULL,
       PRIMARY KEY (itemid,clock,ns)
);
  • history テーブル作成
CREATE TABLE history (
  itemid bigint NOT NULL,
  clock  integer DEFAULT '0' NOT NULL,
  value  DOUBLE PRECISION DEFAULT '0.0000' NOT NULL,
  ns     integer DEFAULT '0' NOT NULL,
  PRIMARY KEY (itemid,clock,ns)
);
  • history テーブル所有者変更
alter table history owner to zabbix;
  • 正常データを復旧
select * into history from history_bak where itemid>0;
service zabbix6_server stop
96101:20240406:134111.061 executing housekeeper
96101:20240406:134111.061 zbx_setproctitle() title:'housekeeper [connecting to the database]'
96101:20240406:134111.063 zbx_setproctitle() title:'housekeeper [removing old history and trends]'
96101:20240406:134112.198 zbx_setproctitle() title:'housekeeper [removing old problems]'
96101:20240406:134112.199 zbx_setproctitle() title:'housekeeper [removing old events]'
96101:20240406:134112.201 zbx_setproctitle() title:'housekeeper [removing old sessions]'
96101:20240406:134112.202 zbx_setproctitle() title:'housekeeper [removing old service alarms]'
96101:20240406:134112.237 zbx_setproctitle() title:'housekeeper [removing old audit log items]'
96101:20240406:134112.254 zbx_setproctitle() title:'housekeeper [removing old autoreg_hosts]'
96101:20240406:134112.255 zbx_setproctitle() title:'housekeeper [removing old records]'
96101:20240406:134112.278 zbx_setproctitle() title:'housekeeper [removing deleted items data]'
96101:20240406:134112.278 query [txnlev:0] [select housekeeperid,tablename,field,value from housekeeper where tablename in ('history','history_log','history_str','history_text','history_uint','trends','trends_uint','events') order by tablename]
96101:20240406:134112.281 housekeeper [deleted 23204 hist/trends, 0 items/triggers, 0 events, 2 problems, 0 sessions, 0 alarms, 0 audit, 0 autoreg_host, 0 records in 1.218292 sec, idle for 6 hour(s)]
96101:20240406:134112.281 zbx_setproctitle() title:'housekeeper [deleted 23204 hist/trends, 0 items/triggers, 0 events, 0 sessions, 0 alarms, 0 audit items, 0 autoreg_host, 0 records in 1.218292 sec, idle for 6 hour(s)]'
  • zabbix_server が正常に起動し、housekeeper も正常に処理された。