トップ 一覧 検索 ヘルプ RSS ログイン

zabbix housekeeper 障害の変更点

  • 追加された行はこのように表示されます。
  • 削除された行はこのように表示されます。
*zabbix-server を起動して30分ぐらいで停止する。

*zabbix-server 停止直前で housekeeper が起動している。
*zabbix_server.log
 23069:20240405:155505.424 executing housekeeper
 23069:20240405:155505.832 Got signal [signal:11(SIGSEGV),reason:1,refaddr:0x0]. Crashing ...
 23069:20240405:155505.832 ====== Fatal information: ======
 23069:20240405:155505.833 program counter not available for this architecture
 23069:20240405:155505.833 === Registers: ===
 23069:20240405:155505.833 register dump not available for this architecture
 23069:20240405:155505.833 === Backtrace: ===
 23069:20240405:155505.833 11: 0x53527b <zbx_backtrace+0x4b> at /usr/local/sbin/zabbix_server
 23069:20240405:155505.833 10: 0x535422 <zbx_log_fatal_info+0x92> at /usr/local/sbin/zabbix_server
 23069:20240405:155505.833 9: 0x535abb <zbx_set_common_signal_handlers+0x28b> at /usr/local/sbin/zabbix_server
 23069:20240405:155505.833 8: 0x826129b6e <pthread_sigmask+0x54e> at /lib/libthr.so.3
 23069:20240405:155505.833 7: 0x82612911f <pthread_setschedparam+0x83f> at /lib/libthr.so.3
 23069:20240405:155505.833 6: 0x7ffffffff2d3 <???> at ???
 23069:20240405:155505.833 5: 0x554b4d <is_uint_n_range+0x2d> at /usr/local/sbin/zabbix_server
 23069:20240405:155505.833 4: 0x3b8fb1 <housekeeper_thread+0x4c1> at /usr/local/sbin/zabbix_server
 23069:20240405:155505.833 3: 0x53757c <zbx_thread_start+0x2c> at /usr/local/sbin/zabbix_server
 23069:20240405:155505.833 2: 0x39350c <MAIN_ZABBIX_ENTRY+0x13dc> at /usr/local/sbin/zabbix_server
 23069:20240405:155505.834 1: 0x392813 <MAIN_ZABBIX_ENTRY+0x6e3> at /usr/local/sbin/zabbix_server
 23069:20240405:155505.834 0: 0x534c82 <daemon_start+0x1b2> at /usr/local/sbin/zabbix_server
 23069:20240405:155505.834 === Memory map: ===
 23069:20240405:155505.834 memory map not available for this platform
 23069:20240405:155505.834 ================================
 23036:20240405:155505.835 One child process died (PID:23069,exitcode/signal:1). Exiting ...
 23036:20240405:155505.835 PROCESS EXIT: 23069
 23045:20240405:155505.835 HA manager has been paused
zabbix_server [23036]: Error waiting for process with PID 23069: [10] No child processes
 zabbix_server [23036]: Error waiting for process with PID 23069: [10] No child processes
 23045:20240405:155505.889 HA manager has been stopped
 23036:20240405:155505.896 syncing trend data...
 23036:20240405:155505.908 syncing trend data done
 23036:20240405:155505.909 Zabbix Server stopped. Zabbix 6.0.28 (revision 1f9d541d29a).

!原因
*history テーブルに異常なデータが含まれていた。

 zabbix=# select count(*) from history;
   count
 ---------
  3857659
 (1 row)

 zabbix=# select count(*) from history where itemid>0;
   count
 ---------
  3857658
 (1 row)

 zabbix=# select itemid,count(*) from history group by itemid having count(*)=1;
  itemid | count
 --------+-------
         |     1
 (1 row)

!対応
*現行 history を別名に退避し、正常な history を作成、正常データのみをコピーした。

*zabbix-server 停止
 service zabbix6_server stop
*psql 起動
 # su -l pgsql
 $ psql zabbix
 psql>
*history テーブルを退避
 psql> alter table history rename to history_bak; 
*history テーブル作成SQLを確認 /usr/local/share/zabbix6/server/database/postgresql/schema.sql
 CREATE TABLE history (
        itemid                   bigint                                    NOT NULL,
        clock                    integer         DEFAULT '0'               NOT NULL,
        value                    DOUBLE PRECISION DEFAULT '0.0000'          NOT NULL,
        ns                       integer         DEFAULT '0'               NOT NULL,
        PRIMARY KEY (itemid,clock,ns)
 );
*history テーブル作成
 CREATE TABLE history (
   itemid bigint NOT NULL,
   clock  integer DEFAULT '0' NOT NULL,
   value  DOUBLE PRECISION DEFAULT '0.0000' NOT NULL,
   ns     integer DEFAULT '0' NOT NULL,
   PRIMARY KEY (itemid,clock,ns)
 );
*history テーブル所有者変更
 alter table history owner to zabbix;
*正常データを復旧
 select * into history from history_bak where itemid>0;

*zabbix-server 開始
 service zabbix6_server stop

*zabbix_server.log
 96101:20240406:134111.061 executing housekeeper
 96101:20240406:134111.061 zbx_setproctitle() title:'housekeeper [connecting to the database]'
 96101:20240406:134111.063 zbx_setproctitle() title:'housekeeper [removing old history and trends]'
 96101:20240406:134112.198 zbx_setproctitle() title:'housekeeper [removing old problems]'
 96101:20240406:134112.199 zbx_setproctitle() title:'housekeeper [removing old events]'
 96101:20240406:134112.201 zbx_setproctitle() title:'housekeeper [removing old sessions]'
 96101:20240406:134112.202 zbx_setproctitle() title:'housekeeper [removing old service alarms]'
 96101:20240406:134112.237 zbx_setproctitle() title:'housekeeper [removing old audit log items]'
 96101:20240406:134112.254 zbx_setproctitle() title:'housekeeper [removing old autoreg_hosts]'
 96101:20240406:134112.255 zbx_setproctitle() title:'housekeeper [removing old records]'
 96101:20240406:134112.278 zbx_setproctitle() title:'housekeeper [removing deleted items data]'
 96101:20240406:134112.278 query [txnlev:0] [select housekeeperid,tablename,field,value from housekeeper where tablename in ('history','history_log','history_str','history_text','history_uint','trends','trends_uint','events') order by tablename]
 96101:20240406:134112.281 housekeeper [deleted 23204 hist/trends, 0 items/triggers, 0 events, 2 problems, 0 sessions, 0 alarms, 0 audit, 0 autoreg_host, 0 records in 1.218292 sec, idle for 6 hour(s)]
 96101:20240406:134112.281 zbx_setproctitle() title:'housekeeper [deleted 23204 hist/trends, 0 items/triggers, 0 events, 0 sessions, 0 alarms, 0 audit items, 0 autoreg_host, 0 records in 1.218292 sec, idle for 6 hour(s)]'

*zabbix_server が正常に起動し、housekeeper も正常に処理された。