Litao OK Blog

A little blog for my life.

Nginx Error Log : Too Many Open Files While Connecting to Upstream

当访问量增大时,nginx连接upstream后端时会大量报错,例如:

1
2016/03/29 01:49:47 [alert] 8675#0: *297501684 socket() failed (24: Too many open files) while connecting to upstream, client: 191.109.198.194, server: api.xxxx.com, request: "POST /myapps/native HTTP/1.1", upstream: "http://127.0.0.1:9001/myapps/native", host: "api.xxxx.com"

检查系统limit设置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63615
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63615
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

发现open files那项已经足够大,检查/etc/sysctl.conf,增加参数:

1
fs.file-max = 1048576

执行命令使之生效:

1
sudo sysctl -p

检查nginx配置文件,发现配置打开数量过少,修改之:

1
2
3
4
5
worker_processes  4;
worker_rlimit_nofile 40960;
events {
    # worker_connections  1024;
}

修改upstream配置,把fail_timeout设置为最小的1s:

1
2
3
4
5
6
upstream my_api {
    server 127.0.0.1:9000 fail_timeout=1;
    server 127.0.0.1:9001 fail_timeout=1;
    server 127.0.0.1:9002 fail_timeout=1;
    server 127.0.0.1:9003 fail_timeout=1;
}

重启nginx,注意这里是restart,不是reload:

1
sudo /etc/init.d/nginx restart

通过系统命令查看效果:

1
2
ss -s
top

发现连接数开始下降,nginx不再报错,打开系统监控,看到nginx请求响应时间由报错时的400ms下降到60ms,修改成功!

api