本文共 5878 字,大约阅读时间需要 19 分钟。
在生产环境中,有客户架构为阿里云线上环境及线下IDC需要内网互通,互联采用阿里云使用第三方深信服云产品与线下IDC侧Cisco防火墙ipsec打通实现,主要用于定时阿里云文件及数据备份至IDC,在生产应用中无故隧道会不定时中断,联系深信服及思科售后排查均没有结果,但是进行手动的重启阿里云上深信服设备隧道立即恢复,在两边网络工程师排查无果后,想到去编写监控脚本,如果隧道终端去利用python重启深信服设备,从而恢复隧道,数据传输延迟timeout及使用断点续传,当网络层面异常无法解决时,换另一种思路来解决问题。
2.1 编写隧道监控脚本
由于线上阿里云侧为公有云,且为配置EIP及NAT网关,ecs均采用前端公网SLB负责业务请求接入,其内部无法出公网,隧道监控脚本想告警出来发送至微信及后续的去操作深信服需要公网连通,因此在线下IDC侧放置检查及重启脚本。
2.2 深信服重启
- 利用Python编写去操作深信服,web页面模拟登录,主要利用到了selenium模块,logging来记录日志。
- 利用阿里云ECS API来操作重启深信服设备。
检测隧道连通性,如果隧道终端告警至微信及钉钉,其次触发深信服重启脚本。
#!/bin/bash#检测内网地址IP=10.10.10.2dir="/sangfor/Shscripts/pdc/"if [ ! -d ${dir} ];then mkdir -p ${dir}fiecho 1 > ${dir}pdcping.lockwhile truedo #日志分割归档 Time=`date +%F` TIME="${Time} 23:59" if [ "${data}" == "${TIME}" ];then mkdir ${dir}${Time} && mv ${dir}pdcping.log ${dir}${Time}-pingpdc.log mv ${dir}${Time}-pingpdc.log ${dir}${Time} fi find ${dir} -mtime +7 -type d -exec rm -rf {} \; find ${dir} -mtime +7 -name "*-pingpdc.log" -exec rm -rf {} \; data=`date +%F' '%H:%M` data1=`date +%F' '%H:%M:%S` echo "------------${data1}---------------">>${dir}pingpdc.log ping -c 10 ${IP} >>${dir}pingpdc.log if [ $? -eq 1 ];then STAT=`cat ${dir}pdcping.lock` if [ ${STAT} -eq 1 ];then /usr/local/python34/bin/python3 /sangfor/Pysangfor/sangfor_public.py echo 0 > ${dir}pdcping.lock else continue fi else STAT=`cat ${dir}pdcping.lock` if [ ${STAT} -eq 0 ];then echo 1 > ${dir}pdcping.lock else continue fi fidone
为防止隧道检测脚本异常,另外编写监控监测脚本的脚本配合定时任务来定时监控,如果异常,重新拉起。
#!/bin/bashnum=$(ps -ef |grep pdc.sh|wc -l)cmd="/usr/bin/nohup /bin/bash /sangfor/Shscripts/pdc/pdc.sh &"if [ ${num} -lt 2 ];then${cmd}fi
配合定时任务
* * * * * /bin/bash /sangfor/Shscripts/pdc/checkpdc.sh
yum -y install zlib-devel zlib readline-devel openssl-devel wget gcc-c++ Xvfb lrzsz firefoxcd /tmpwget -c https://www.python.org/ftp/python/3.4.5/Python-3.4.5.tgztar -zxvf Python-3.4.5.tgzcd Python-3.4.5./configure --prefix=/usr/local/python34make && make installecho "export PATH=$PATH:/usr/local/python34/bin" >/etc/profile.d/python34.shsource /etc/profile.d/python34.sh
cd /tmpwget https://bootstrap.pypa.io/get-pip.pypython3 get-pip.py
pip3 install seleniumpip3 install pyvirtualdisplaypip3 install xvfbwrapper
cd /tmpwget -c https://github.com/mozilla/geckodriver/releases/download/v0.16.1/geckodriver-v0.16.1-linux64.tar.gztar zxvf geckodriver-v0.16.1-linux64.tar.gzcp geckodriver /usr/bin/
cat > /sangfor/Pysangfor/sangfor_public.py<
通过阿里云ECS API操作深信服设备
#!/bin/env python3# -*- coding:UTF-8 -*-# _author:kaliarchfrom aliyunsdkcore import clientfrom aliyunsdkecs.request.v20140526 import RebootInstanceRequest,StartInstanceRequest,StopInstanceRequestimport timeimport osimport loggingclass ecsOper(): def __init__(self,logger): self.clentoper = client.AcsClient('', ' ', 'cn-hangzhou') self.logger = logger self.logger.info("------------------------start reboot *** ecs of API log-------------") def reboot_instance(self): # 设置参数 request = RebootInstanceRequest.RebootInstanceRequest() request.set_accept_format('json') request.add_query_param('InstanceId', 'i-bpxxzx1rlsgvclq79au') # 发起请求 response = self.clentoper.do_action_with_exception(request) self.logger.info("public ecs *** reboot successful!") self.logger.info(response) print(response) def start_instance(self): request = StartInstanceRequest.StartInstanceRequest() request.set_accept_format('json') request.add_query_param('InstanceId', 'i-bpxxzx1rlsgvclq79au') # 发起请求 response = self.clentoper.do_action_with_exception(request) self.logger.info("public ecs *** start successful!") self.logger.info(response) print(response) def stop_instance(self): request = StopInstanceRequest.StopInstanceRequest() request.set_accept_format('json') request.add_query_param('InstanceId', 'i-bp1djzd1rlsgvclq79au') request.add_query_param('ForceStop', 'false') # 发起请求 response = self.clentoper.do_action_with_exception(request) request.add_query_param('InstanceId', 'i-bpxxzxd1rlsgvclq79au') self.logger.info(response) print(response) def testlog(self): self.logger.info("public test log")class Glp_Log: def __init__(self,filename): self.filename = filename def createDir(self): _LOGDIR = os.path.join(os.path.dirname(__file__), 'publiclog') print(_LOGDIR) _TIME = time.strftime('%Y-%m-%d', time.gmtime()) + '-' _LOGNAME = _TIME + self.filename print(_LOGNAME) LOGFILENAME = os.path.join(_LOGDIR, _LOGNAME) print(LOGFILENAME) if not os.path.exists(_LOGDIR): os.mkdir(_LOGDIR) return LOGFILENAME print(LOGFILENAME) def createlogger(self,logfilename): logger= logging.getLogger() logger.setLevel(logging.INFO) handler = logging.FileHandler(logfilename) handler.setLevel(logging.INFO) formater = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') handler.setFormatter(formater) logger.addHandler(handler) return loggerif __name__ == "__main__": glploger = Glp_Log('public-***.log') logfilename = glploger.createDir() logger = glploger.createlogger(logfilename) app = ecsOper(logger) app.reboot_instance()
查看检查脚本日志已经进行了切割,且保留7天的日志,防止日志过大占用过多磁盘空间
微信告警信息钉钉告警信息查看python脚本深信服重启日志其简单的实现了故障自愈,利用其思路客户配合很多业务,例如简单的应用重启等。
转载于:https://blog.51cto.com/kaliarch/2095178