posix_memalign error
Anoop Alias
anoopalias01 at gmail.com
Fri Aug 10 06:42:20 UTC 2018
I may have found the root cause of this issue. Many thanks to Igor for the
valuable inputs
The issue had something to do with the fact that I was calling nginx -s
reload from python subprocess module
and I believe the error was coming from the fork in python
https://stackoverflow.com/questions/1367373/python-subprocess-popen-oserror-errno-12-cannot-allocate-memory
https://stackoverflow.com/questions/5306075/python-memory-allocation-error-using-subprocess-popen
As Igor suggested I changed the subprocess call in python to signal (SIGHUP
to master binary) and the logs don't have the ENOMEM error anymore, at
least in the past 12+ hours
The memory usage for ~10k virtual hosts are a bit high though, but things
do work as expected and no more errors
Thank you all again
On Wed, Aug 8, 2018 at 9:03 AM Anoop Alias <anoopalias01 at gmail.com> wrote:
> Hi Igor,
>
> Yes the server runs other software including httpd with a similar number
> of vhost
>
> # grep "<VirtualHost" /etc/apache2/conf/httpd.conf|wc -l
> 5168
>
> I haven't found issue with the other softwares in the logs relating to
> memory
>
> Infact httpd (event mpm) use lesser memory to load similar config
>
> # ps_mem| head -1 && ps_mem |grep httpd
> Private + Shared = RAM used Program
> 585.6 MiB + 392.0 MiB = 977.6 MiB httpd (63)
>
> # ps_mem| head -1 && ps_mem |grep nginx
> Private + Shared = RAM used Program
> 999.8 MiB + 2.5 GiB = 3.5 GiB nginx (3)
>
> The server is a shared hosting one and runs CloudLinux , but as far as I
> know ,CloudLinux applies limits to only user level process and not nginx
>
> The nginx HUP is needed as this is triggered by changes in apache
> configuration and nginx need to reload the new config . For log file reload
> SIGUSR1 is used
>
>
>
>
>
> On Tue, Aug 7, 2018 at 5:50 PM Igor A. Ippolitov <iippolitov at nginx.com>
> wrote:
>
>> Anoop,
>>
>> I don't see any troubles with your configuration.
>> Also, if you have 120G of RAM and a single worker - the problem is not in
>> nginx.
>> Do you have other software running on the host?
>>
>> Basically, you just run out of memory.
>>
>> You can optimize your reload though: use "service nginx reload" (or "kill
>> -SIGHUP") to reload nginx configuration.
>> When you do nginx -s reload - you make nginx parse configuration (and it
>> requires memory) and then send a signal to the running master. You can
>> avoid this overhead with 'service' command as it uses 'kill' documented in
>> the manual page.
>>
>> On 06.08.2018 22:55, Anoop Alias wrote:
>>
>> Hi Igor,
>>
>> Config is reloaded using
>>
>> /usr/sbin/nginx -s reload
>>
>> this is invoked from a python/shell script ( Nginx is installed on a web
>> control panel )
>>
>> The top-level Nginx config is in the gist below
>>
>> https://gist.github.com/AnoopAlias/ba5ad6749a586c7e267672ee65b32b3a
>>
>> It further includes ~8k server blocks or more in some servers. Out of
>> this 2/3 are server {} blocks with TLS config and 1/3 non-TLS ones
>>
>> ]# pwd
>> /etc/nginx/sites-enabled
>> # grep "server {" *|wc -l
>> 7886
>>
>> And yes most of them are very similar and mostly proxy to upstream httpd
>>
>> I have tried removing all the loadable modules and even tried an older
>> version of nginx and all produce the error
>>
>>
>> # numastat -m
>>
>> Per-node system memory usage (in MBs):
>> Node 0 Node 1 Total
>> --------------- --------------- ---------------
>> MemTotal 65430.84 65536.00 130966.84
>> MemFree 5491.26 40.89 5532.15
>> MemUsed 59939.58 65495.11 125434.69
>> Active 22295.61 21016.09 43311.70
>> Inactive 8742.76 4662.48 13405.24
>> Active(anon) 16717.10 16572.19 33289.29
>> Inactive(anon) 2931.94 1388.14 4320.08
>> Active(file) 5578.50 4443.91 10022.41
>> Inactive(file) 5810.82 3274.34 9085.16
>> Unevictable 0.00 0.00 0.00
>> Mlocked 0.00 0.00 0.00
>> Dirty 7.04 1.64 8.67
>> Writeback 0.00 0.00 0.00
>> FilePages 18458.93 10413.97 28872.90
>> Mapped 862.14 413.38 1275.52
>> AnonPages 12579.49 15264.37 27843.86
>> Shmem 7069.52 2695.71 9765.23
>> KernelStack 18.34 3.03 21.38
>> PageTables 153.14 107.77 260.90
>> NFS_Unstable 0.00 0.00 0.00
>> Bounce 0.00 0.00 0.00
>> WritebackTmp 0.00 0.00 0.00
>> Slab 4830.68 2254.55 7085.22
>> SReclaimable 2061.05 921.72 2982.77
>> SUnreclaim 2769.62 1332.83 4102.45
>> AnonHugePages 4.00 2.00 6.00
>> HugePages_Total 0.00 0.00 0.00
>> HugePages_Free 0.00 0.00 0.00
>> HugePages_Surp 0.00 0.00 0.00
>>
>>
>> Thanks,
>>
>>
>>
>>
>>
>> On Mon, Aug 6, 2018 at 6:33 PM Igor A. Ippolitov <iippolitov at nginx.com>
>> wrote:
>>
>>> Anoop,
>>>
>>> I suppose, most of your 10k servers are very similar, right?
>>> Please, post top level configuration and a typical server{}, please.
>>>
>>> Also, how do you reload configuration? With 'service nginx reload' or
>>> may be other commands?
>>>
>>> It looks like you have a lot of fragmented memory and only 4gb free in
>>> the second numa node.
>>> So, I'd say this is OK that you are getting errors from allocating a 16k
>>> stripes.
>>>
>>> Could you please post numastat -m output additionally. Just to make sure
>>> you have half of the memory for the second CPU.
>>> And we'll have a look if memory utilization may be optimized based on
>>> your configuration.
>>>
>>> Regards,
>>> Igor.
>>>
>>> On 04.08.2018 07:54, Anoop Alias wrote:
>>>
>>> Hi Igor,
>>>
>>> Setting vm.max_map_count to 20x the normal value did not help
>>>
>>> The issue happens on a group of servers and among the group, it shows up
>>> only in servers which have ~10k server{} blocks
>>>
>>> On servers that have lower number of server{} blocks , the ENOMEM issue
>>> is not there
>>>
>>> Also, I can find that the RAM usage of the Nginx process is directly
>>> proportional to the number of server {} blocks
>>>
>>> For example on a server having the problem
>>>
>>> # ps_mem| head -1 && ps_mem |grep nginx
>>> Private + Shared = RAM used Program
>>> 1.0 GiB + 2.8 GiB = 3.8 GiB nginx (3)
>>>
>>>
>>> That is for a single worker process with 4 threads in thread_pool
>>> # pstree|grep nginx
>>> |-nginx-+-nginx---4*[{nginx}]
>>> | `-nginx
>>>
>>> Whatever config change I try the memory usage seem to mostly depend on
>>> the number of server contexts defined
>>>
>>> Now the issue mostly happen in nginx reload ,when one more worker
>>> process will be active in shutting down mode
>>>
>>> I believe the memalign error is thrown by the worker being shutdown,
>>> this is because the sites work after the error and also the pid mentioned
>>> in the error would have gone when I check ps
>>>
>>>
>>> # pmap 948965|grep 16K
>>> 00007f2923ff2000 16K r-x-- ngx_http_redis2_module.so
>>> 00007f2924fd7000 16K r---- libc-2.17.so
>>> 00007f2925431000 16K rw--- [ anon ]
>>> 00007f292584a000 16K rw--- [ anon ]
>>>
>>> Aug 4 05:50:00 b kernel: SysRq : Show Memory
>>> Aug 4 05:50:00 b kernel: Mem-Info:
>>> Aug 4 05:50:00 b kernel: active_anon:7757394 inactive_anon:1021319
>>> isolated_anon:0#012 active_file:3733324 inactive_file:2136476 isolated_
>>> file:0#012 unevictable:0 dirty:1766 writeback:6 wbtmp:0 unstable:0#012
>>> slab_reclaimable:2003687 slab_unreclaimable:901391#012 mapped:316734
>>> shmem:2381810 pagetables:63163 bounce:0#012 free:4851283 free_pcp:11332
>>> free_cma:0
>>> Aug 4 05:50:00 bravo kernel: Node 0 DMA free:15888kB min:8kB low:8kB
>>> high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_
>>> file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
>>> present:15972kB managed:15888kB mlocked:0kB dirty:0kB writeback:0kB
>>> mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB
>>> kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB
>>> local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
>>> all_unreclaimable? yes
>>> Aug 4 05:50:00 b kernel: lowmem_reserve[]: 0 1679 64139 64139
>>>
>>> # cat /proc/buddyinfo
>>> Node 0, zone DMA 0 0 1 0 2 1 1
>>> 0 1 1 3
>>> Node 0, zone DMA32 5284 6753 6677 1083 410 59 1
>>> 0 0 0 0
>>> Node 0, zone Normal 500327 638958 406737 14690 872 106 11
>>> 0 0 0 0
>>> Node 1, zone Normal 584840 291640 188 0 0 0 0
>>> 0 0 0 0
>>>
>>>
>>> The only correlation I see in having the error is the number of
>>> server {} blocks (close to 10k) which then makes the nginx process consume
>>> ~ 4GB of mem with a single worker process and then a reload is done
>>>
>>>
>>>
>>>
>>> On Thu, Aug 2, 2018 at 6:02 PM Igor A. Ippolitov <iippolitov at nginx.com>
>>> wrote:
>>>
>>>> Anoop,
>>>>
>>>> There are two guesses: either mmap allocations limit is hit or memory
>>>> is way too fragmented.
>>>> Could you please track amount of mapped regions for a worker with pmap
>>>> and amount of 16k areas in Normal zones (it is the third number)?
>>>>
>>>> You can also set vm.max_map_count to a higher number (like 20 times
>>>> higher than default) and look if the error is gone.
>>>>
>>>> Please, let me know if increasing vm.max_map_count helps you.
>>>>
>>>> On 02.08.2018 13:06, Anoop Alias wrote:
>>>>
>>>> Hi Igor,
>>>>
>>>> The error happens randomly
>>>>
>>>> 2018/08/02 06:52:42 [emerg] 874514#874514: posix_memalign(16, 16384)
>>>> failed (12: Cannot allocate memory)
>>>> 2018/08/02 09:42:53 [emerg] 872996#872996: posix_memalign(16, 16384)
>>>> failed (12: Cannot allocate memory)
>>>> 2018/08/02 10:16:14 [emerg] 877611#877611: posix_memalign(16, 16384)
>>>> failed (12: Cannot allocate memory)
>>>> 2018/08/02 10:16:48 [emerg] 879410#879410: posix_memalign(16, 16384)
>>>> failed (12: Cannot allocate memory)
>>>> 2018/08/02 10:17:55 [emerg] 876563#876563: posix_memalign(16, 16384)
>>>> failed (12: Cannot allocate memory)
>>>> 2018/08/02 10:20:21 [emerg] 879263#879263: posix_memalign(16, 16384)
>>>> failed (12: Cannot allocate memory)
>>>> 2018/08/02 10:20:51 [emerg] 878991#878991: posix_memalign(16, 16384)
>>>> failed (12: Cannot allocate memory)
>>>>
>>>> # date
>>>> Thu Aug 2 10:58:48 BST 2018
>>>>
>>>> ------------------------------------------
>>>> # cat /proc/buddyinfo
>>>> Node 0, zone DMA 0 0 1 0 2 1 1
>>>> 0 1 1 3
>>>> Node 0, zone DMA32 11722 11057 4663 1647 609 72 10
>>>> 7 1 0 0
>>>> Node 0, zone Normal 755026 710760 398136 21462 1114 18 1
>>>> 0 0 0 0
>>>> Node 1, zone Normal 341295 801810 179604 256 0 0 0
>>>> 0 0 0 0
>>>> -----------------------------------------
>>>>
>>>>
>>>> slabinfo - version: 2.1
>>>> # name <active_objs> <num_objs> <objsize> <objperslab>
>>>> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata
>>>> <active_slabs> <num_slabs> <sharedavail>
>>>> SCTPv6 21 21 1536 21 8 : tunables 0 0
>>>> 0 : slabdata 1 1 0
>>>> SCTP 0 0 1408 23 8 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> kcopyd_job 0 0 3312 9 8 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> dm_uevent 0 0 2608 12 8 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> nf_conntrack_ffffffff81acbb00 14054 14892 320 51 4 :
>>>> tunables 0 0 0 : slabdata 292 292 0
>>>> lvp_cache 36 36 224 36 2 : tunables 0 0
>>>> 0 : slabdata 1 1 0
>>>> lve_struct 4140 4140 352 46 4 : tunables 0 0
>>>> 0 : slabdata 90 90 0
>>>> fat_inode_cache 0 0 744 44 8 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> fat_cache 0 0 40 102 1 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> isofs_inode_cache 0 0 664 49 8 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> ext4_inode_cache 30 30 1088 30 8 : tunables 0 0
>>>> 0 : slabdata 1 1 0
>>>> ext4_xattr 0 0 88 46 1 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> ext4_free_data 0 0 64 64 1 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> ext4_allocation_context 32 32 128 32 1 : tunables 0
>>>> 0 0 : slabdata 1 1 0
>>>> ext4_io_end 0 0 72 56 1 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> ext4_extent_status 102 102 40 102 1 : tunables 0 0
>>>> 0 : slabdata 1 1 0
>>>> jbd2_journal_handle 0 0 48 85 1 : tunables 0
>>>> 0 0 : slabdata 0 0 0
>>>> jbd2_journal_head 0 0 112 36 1 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> jbd2_revoke_table_s 256 256 16 256 1 : tunables 0
>>>> 0 0 : slabdata 1 1 0
>>>> jbd2_revoke_record_s 0 0 32 128 1 : tunables 0
>>>> 0 0 : slabdata 0 0 0
>>>> kvm_async_pf 0 0 136 30 1 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> kvm_vcpu 0 0 18560 1 8 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> xfs_dqtrx 992 992 528 31 4 : tunables 0 0
>>>> 0 : slabdata 32 32 0
>>>> xfs_dquot 3264 3264 472 34 4 : tunables 0 0
>>>> 0 : slabdata 96 96 0
>>>> xfs_ili 4342175 4774399 152 53 2 : tunables 0
>>>> 0 0 : slabdata 90083 90083 0
>>>> xfs_inode 4915588 5486076 1088 30 8 : tunables 0
>>>> 0 0 : slabdata 182871 182871 0
>>>> xfs_efd_item 2680 2760 400 40 4 : tunables 0 0
>>>> 0 : slabdata 69 69 0
>>>> xfs_da_state 1088 1088 480 34 4 : tunables 0 0
>>>> 0 : slabdata 32 32 0
>>>> xfs_btree_cur 1248 1248 208 39 2 : tunables 0 0
>>>> 0 : slabdata 32 32 0
>>>> xfs_log_ticket 14874 15048 184 44 2 : tunables 0 0
>>>> 0 : slabdata 342 342 0
>>>> xfs_ioend 12909 13104 104 39 1 : tunables 0 0
>>>> 0 : slabdata 336 336 0
>>>> scsi_cmd_cache 5400 5652 448 36 4 : tunables 0 0
>>>> 0 : slabdata 157 157 0
>>>> ve_struct 0 0 848 38 8 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> ip6_dst_cache 1152 1152 448 36 4 : tunables 0 0
>>>> 0 : slabdata 32 32 0
>>>> RAWv6 910 910 1216 26 8 : tunables 0 0
>>>> 0 : slabdata 35 35 0
>>>> UDPLITEv6 0 0 1216 26 8 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> UDPv6 832 832 1216 26 8 : tunables 0 0
>>>> 0 : slabdata 32 32 0
>>>> tw_sock_TCPv6 1152 1376 256 32 2 : tunables 0 0
>>>> 0 : slabdata 43 43 0
>>>> TCPv6 510 510 2176 15 8 : tunables 0 0
>>>> 0 : slabdata 34 34 0
>>>> cfq_queue 3698 5145 232 35 2 : tunables 0 0
>>>> 0 : slabdata 147 147 0
>>>> bsg_cmd 0 0 312 52 4 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> mqueue_inode_cache 136 136 960 34 8 : tunables 0 0
>>>> 0 : slabdata 4 4 0
>>>> hugetlbfs_inode_cache 1632 1632 632 51 8 : tunables 0
>>>> 0 0 : slabdata 32 32 0
>>>> configfs_dir_cache 1472 1472 88 46 1 : tunables 0 0
>>>> 0 : slabdata 32 32 0
>>>> dquot 0 0 256 32 2 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> userfaultfd_ctx_cache 32 32 128 32 1 : tunables 0
>>>> 0 0 : slabdata 1 1 0
>>>> fanotify_event_info 2336 2336 56 73 1 : tunables 0
>>>> 0 0 : slabdata 32 32 0
>>>> dio 6171 6222 640 51 8 : tunables 0 0
>>>> 0 : slabdata 122 122 0
>>>> pid_namespace 42 42 2192 14 8 : tunables 0 0
>>>> 0 : slabdata 3 3 0
>>>> posix_timers_cache 1056 1056 248 33 2 : tunables 0 0
>>>> 0 : slabdata 32 32 0
>>>> UDP-Lite 0 0 1088 30 8 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> flow_cache 2268 2296 144 28 1 : tunables 0 0
>>>> 0 : slabdata 82 82 0
>>>> xfrm_dst_cache 896 896 576 28 4 : tunables 0 0
>>>> 0 : slabdata 32 32 0
>>>> ip_fib_alias 2720 2720 48 85 1 : tunables 0 0
>>>> 0 : slabdata 32 32 0
>>>> RAW 3977 4224 1024 32 8 : tunables 0 0
>>>> 0 : slabdata 132 132 0
>>>> UDP 4110 4110 1088 30 8 : tunables 0 0
>>>> 0 : slabdata 137 137 0
>>>> tw_sock_TCP 4756 5216 256 32 2 : tunables 0 0
>>>> 0 : slabdata 163 163 0
>>>> TCP 2705 2768 1984 16 8 : tunables 0 0
>>>> 0 : slabdata 173 173 0
>>>> scsi_data_buffer 5440 5440 24 170 1 : tunables 0 0
>>>> 0 : slabdata 32 32 0
>>>> blkdev_queue 154 154 2208 14 8 : tunables 0 0
>>>> 0 : slabdata 11 11 0
>>>> blkdev_requests 4397688 4405884 384 42 4 : tunables 0
>>>> 0 0 : slabdata 104902 104902 0
>>>> blkdev_ioc 11232 11232 112 36 1 : tunables 0 0
>>>> 0 : slabdata 312 312 0
>>>> user_namespace 0 0 1304 25 8 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> sock_inode_cache 12282 12282 704 46 8 : tunables 0 0
>>>> 0 : slabdata 267 267 0
>>>> file_lock_cache 20056 20960 200 40 2 : tunables 0 0
>>>> 0 : slabdata 524 524 0
>>>> net_namespace 6 6 5056 6 8 : tunables 0 0
>>>> 0 : slabdata 1 1 0
>>>> shmem_inode_cache 16970 18952 712 46 8 : tunables 0 0
>>>> 0 : slabdata 412 412 0
>>>> Acpi-ParseExt 39491 40432 72 56 1 : tunables 0 0
>>>> 0 : slabdata 722 722 0
>>>> Acpi-State 1683 1683 80 51 1 : tunables 0 0
>>>> 0 : slabdata 33 33 0
>>>> Acpi-Namespace 11424 11424 40 102 1 : tunables 0 0
>>>> 0 : slabdata 112 112 0
>>>> task_delay_info 15336 15336 112 36 1 : tunables 0 0
>>>> 0 : slabdata 426 426 0
>>>> taskstats 1568 1568 328 49 4 : tunables 0 0
>>>> 0 : slabdata 32 32 0
>>>> proc_inode_cache 169897 190608 680 48 8 : tunables 0 0
>>>> 0 : slabdata 3971 3971 0
>>>> sigqueue 2208 2208 168 48 2 : tunables 0 0
>>>> 0 : slabdata 46 46 0
>>>> bdev_cache 792 792 896 36 8 : tunables 0 0
>>>> 0 : slabdata 22 22 0
>>>> sysfs_dir_cache 74698 74698 120 34 1 : tunables 0 0
>>>> 0 : slabdata 2197 2197 0
>>>> mnt_cache 163197 163424 256 32 2 : tunables 0 0
>>>> 0 : slabdata 5107 5107 0
>>>> filp 64607 97257 320 51 4 : tunables 0 0
>>>> 0 : slabdata 1907 1907 0
>>>> inode_cache 370744 370947 616 53 8 : tunables 0 0
>>>> 0 : slabdata 6999 6999 0
>>>> dentry 1316262 2139228 192 42 2 : tunables 0
>>>> 0 0 : slabdata 50934 50934 0
>>>> iint_cache 0 0 80 51 1 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> buffer_head 1441470 2890290 104 39 1 : tunables 0
>>>> 0 0 : slabdata 74110 74110 0
>>>> vm_area_struct 194998 196840 216 37 2 : tunables 0 0
>>>> 0 : slabdata 5320 5320 0
>>>> mm_struct 2679 2760 1600 20 8 : tunables 0 0
>>>> 0 : slabdata 138 138 0
>>>> files_cache 8680 8925 640 51 8 : tunables 0 0
>>>> 0 : slabdata 175 175 0
>>>> signal_cache 3691 3780 1152 28 8 : tunables 0 0
>>>> 0 : slabdata 135 135 0
>>>> sighand_cache 1950 2160 2112 15 8 : tunables 0 0
>>>> 0 : slabdata 144 144 0
>>>> task_xstate 8070 8658 832 39 8 : tunables 0 0
>>>> 0 : slabdata 222 222 0
>>>> task_struct 1913 2088 4080 8 8 : tunables 0 0
>>>> 0 : slabdata 261 261 0
>>>> cred_jar 31699 33936 192 42 2 : tunables 0 0
>>>> 0 : slabdata 808 808 0
>>>> anon_vma_chain 164026 168704 64 64 1 : tunables 0 0
>>>> 0 : slabdata 2636 2636 0
>>>> anon_vma 84104 84594 88 46 1 : tunables 0 0
>>>> 0 : slabdata 1839 1839 0
>>>> pid 11127 12576 128 32 1 : tunables 0 0
>>>> 0 : slabdata 393 393 0
>>>> shared_policy_node 9350 9350 48 85 1 : tunables 0 0
>>>> 0 : slabdata 110 110 0
>>>> numa_policy 62 62 264 31 2 : tunables 0 0
>>>> 0 : slabdata 2 2 0
>>>> radix_tree_node 771778 1194312 584 28 4 : tunables 0 0
>>>> 0 : slabdata 42654 42654 0
>>>> idr_layer_cache 2538 2565 2112 15 8 : tunables 0 0
>>>> 0 : slabdata 171 171 0
>>>> dma-kmalloc-8192 0 0 8192 4 8 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> dma-kmalloc-4096 0 0 4096 8 8 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> dma-kmalloc-2048 0 0 2048 16 8 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> dma-kmalloc-1024 0 0 1024 32 8 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> dma-kmalloc-512 0 0 512 32 4 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> dma-kmalloc-256 0 0 256 32 2 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> dma-kmalloc-128 0 0 128 32 1 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> dma-kmalloc-64 0 0 64 64 1 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> dma-kmalloc-32 0 0 32 128 1 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> dma-kmalloc-16 0 0 16 256 1 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> dma-kmalloc-8 0 0 8 512 1 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> dma-kmalloc-192 0 0 192 42 2 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> dma-kmalloc-96 0 0 96 42 1 : tunables 0 0
>>>> 0 : slabdata 0 0 0
>>>> kmalloc-8192 385 388 8192 4 8 : tunables 0 0
>>>> 0 : slabdata 97 97 0
>>>> kmalloc-4096 9296 10088 4096 8 8 : tunables 0 0
>>>> 0 : slabdata 1261 1261 0
>>>> kmalloc-2048 65061 133536 2048 16 8 : tunables 0 0
>>>> 0 : slabdata 8346 8346 0
>>>> kmalloc-1024 11987 21120 1024 32 8 : tunables 0 0
>>>> 0 : slabdata 660 660 0
>>>> kmalloc-512 107510 187072 512 32 4 : tunables 0 0
>>>> 0 : slabdata 5846 5846 0
>>>> kmalloc-256 160498 199104 256 32 2 : tunables 0 0
>>>> 0 : slabdata 6222 6222 0
>>>> kmalloc-192 144975 237426 192 42 2 : tunables 0 0
>>>> 0 : slabdata 5653 5653 0
>>>> kmalloc-128 36799 108096 128 32 1 : tunables 0 0
>>>> 0 : slabdata 3378 3378 0
>>>> kmalloc-96 99510 238896 96 42 1 : tunables 0 0
>>>> 0 : slabdata 5688 5688 0
>>>> kmalloc-64 7978152 8593280 64 64 1 : tunables 0
>>>> 0 0 : slabdata 134270 134270 0
>>>> kmalloc-32 2939882 3089664 32 128 1 : tunables 0
>>>> 0 0 : slabdata 24138 24138 0
>>>> kmalloc-16 172057 172288 16 256 1 : tunables 0 0
>>>> 0 : slabdata 673 673 0
>>>> kmalloc-8 109568 109568 8 512 1 : tunables 0 0
>>>> 0 : slabdata 214 214 0
>>>> kmem_cache_node 893 896 64 64 1 : tunables 0 0
>>>> 0 : slabdata 14 14 0
>>>> kmem_cache 612 612 320 51 4 : tunables 0 0
>>>> 0 : slabdata 12 12 0
>>>>
>>>> -------------------------------------------------
>>>>
>>>>
>>>> # uname -r
>>>> 3.10.0-714.10.2.lve1.5.17.1.el7.x86_64
>>>>
>>>> --------------------------------------------------------
>>>>
>>>> Core part of glances
>>>> http://i.imgur.com/La5JbQn.png
>>>> -----------------------------------------------------------
>>>>
>>>> Thank you very much for looking into this
>>>>
>>>>
>>>> On Thu, Aug 2, 2018 at 12:37 PM Igor A. Ippolitov <iippolitov at nginx.com>
>>>> wrote:
>>>>
>>>>> Anoop,
>>>>>
>>>>> I doubt this will be the solution, but may we have a look at
>>>>> /proc/buddyinfo and /proc/slabinfo the moment when nginx can't allocate
>>>>> memory?
>>>>>
>>>>> On 02.08.2018 08:15, Anoop Alias wrote:
>>>>>
>>>>> Hi Maxim,
>>>>>
>>>>> I enabled debug and the memalign call is happening on nginx reloads
>>>>> and the ENOMEM happen sometimes on the reload(not on all reloads)
>>>>>
>>>>> 2018/08/02 05:59:08 [notice] 872052#872052: signal process started
>>>>> 2018/08/02 05:59:23 [notice] 871570#871570: signal 1 (SIGHUP) received
>>>>> from 872052, reconfiguring
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: wake up, sigio 0
>>>>> 2018/08/02 05:59:23 [notice] 871570#871570: reconfiguring
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 0000000002B0DA00:16384 @16 === > the memalign call on reload
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: malloc:
>>>>> 00000000087924D0:4560
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 000000000E442E00:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: malloc:
>>>>> 0000000005650850:4096
>>>>> 20
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: bind() xxxx:443 #71
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: bind() xxxx:443 #72
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: bind() xxxx:443 #73
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: bind() xxxx:443 #74
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: add cleanup:
>>>>> 000000005340D728
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000024D3260:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000517BAF10:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 0000000053854FC0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 0000000053855FD0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 0000000053856FE0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 0000000053857FF0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: posix_memalign:
>>>>> 0000000053859000:16384 @16
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 000000005385D010:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 000000005385E020:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 000000005385F030:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536CD160:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536CE170:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536CF180:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D0190:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D11A0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D21B0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D31C0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D41D0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D51E0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D61F0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D7200:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D8210:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D9220:4096
>>>>>
>>>>>
>>>>> Infact there are lot of such calls during a reload
>>>>>
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA17ED00:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA1B0FF0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA1E12C0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA211590:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA243880:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA271B30:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA2A3E20:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA2D20D0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA3063E0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA334690:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA366980:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA396C50:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA3C8F40:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA3F9210:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA4294E0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA45B7D0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA489A80:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA4BBD70:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA4EA020:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA51E330:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA54C5E0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA57E8D0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA5AEBA0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA5DEE70:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA611160:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA641430:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA671700:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA6A29E0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA6D5CE0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA707FD0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA736280:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA768570:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA796820:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA7CAB30:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA7F8DE0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA82B0D0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA85B3A0:16384 @16
>>>>>
>>>>>
>>>>>
>>>>> What is perplexing is that the system has enough free (available RAM)
>>>>> #############
>>>>> # free -g
>>>>> total used free shared buff/cache
>>>>> available
>>>>> Mem: 125 54 24 8 46
>>>>> 58
>>>>> Swap: 0 0 0
>>>>> #############
>>>>>
>>>>> # ulimit -a
>>>>> core file size (blocks, -c) 0
>>>>> data seg size (kbytes, -d) unlimited
>>>>> scheduling priority (-e) 0
>>>>> file size (blocks, -f) unlimited
>>>>> pending signals (-i) 514579
>>>>> max locked memory (kbytes, -l) 64
>>>>> max memory size (kbytes, -m) unlimited
>>>>> open files (-n) 1024
>>>>> pipe size (512 bytes, -p) 8
>>>>> POSIX message queues (bytes, -q) 819200
>>>>> real-time priority (-r) 0
>>>>> stack size (kbytes, -s) 8192
>>>>> cpu time (seconds, -t) unlimited
>>>>> max user processes (-u) 514579
>>>>> virtual memory (kbytes, -v) unlimited
>>>>> file locks (-x) unlimited
>>>>>
>>>>> #########################################
>>>>>
>>>>> There is no other thing limiting memory allocation
>>>>>
>>>>> Any way to prevent this or probably identify/prevent this
>>>>>
>>>>>
>>>>> On Tue, Jul 31, 2018 at 7:08 PM Maxim Dounin <mdounin at mdounin.ru>
>>>>> wrote:
>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> On Tue, Jul 31, 2018 at 09:52:29AM +0530, Anoop Alias wrote:
>>>>>>
>>>>>> > I am repeatedly seeing errors like
>>>>>> >
>>>>>> > ######################
>>>>>> > 2018/07/31 03:46:33 [emerg] 2854560#2854560: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > 2018/07/31 03:54:09 [emerg] 2890190#2890190: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > 2018/07/31 04:08:36 [emerg] 2939230#2939230: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > 2018/07/31 04:24:48 [emerg] 2992650#2992650: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > 2018/07/31 04:42:09 [emerg] 3053092#3053092: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > 2018/07/31 04:42:17 [emerg] 3053335#3053335: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > 2018/07/31 04:42:28 [emerg] 3053937#3053937: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > 2018/07/31 04:47:54 [emerg] 3070638#3070638: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > ####################
>>>>>> >
>>>>>> > on a few servers
>>>>>> >
>>>>>> > The servers have enough memory free and the swap usage is 0, yet
>>>>>> somehow
>>>>>> > the kernel denies the posix_memalign with ENOMEM ( this is what I
>>>>>> think is
>>>>>> > happening!)
>>>>>> >
>>>>>> > The numbers requested are always 16, 16k . This makes me suspicious
>>>>>> >
>>>>>> > I have no setting in nginx.conf that reference a 16k
>>>>>> >
>>>>>> > Is there any chance of finding out what requests this and why this
>>>>>> is not
>>>>>> > fulfilled
>>>>>>
>>>>>> There are at least some buffers which default to 16k - for
>>>>>> example, ssl_buffer_size (http://nginx.org/r/ssl_buffer_size).
>>>>>>
>>>>>> You may try debugging log to futher find out where the particular
>>>>>> allocation happens, see here for details:
>>>>>>
>>>>>> http://nginx.org/en/docs/debugging_log.html
>>>>>>
>>>>>> But I don't really think it worth the effort. The error is pretty
>>>>>> clear, and it's better to focus on why these allocations are
>>>>>> denied. Likely you are hitting some limit.
>>>>>>
>>>>>> --
>>>>>> Maxim Dounin
>>>>>> http://mdounin.ru/
>>>>>> _______________________________________________
>>>>>> nginx mailing list
>>>>>> nginx at nginx.org
>>>>>> http://mailman.nginx.org/mailman/listinfo/nginx
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Anoop P Alias*
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> nginx mailing listnginx at nginx.orghttp://mailman.nginx.org/mailman/listinfo/nginx
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> nginx mailing list
>>>>> nginx at nginx.org
>>>>> http://mailman.nginx.org/mailman/listinfo/nginx
>>>>
>>>>
>>>>
>>>> --
>>>> *Anoop P Alias*
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> nginx mailing listnginx at nginx.orghttp://mailman.nginx.org/mailman/listinfo/nginx
>>>>
>>>>
>>>> _______________________________________________
>>>> nginx mailing list
>>>> nginx at nginx.org
>>>> http://mailman.nginx.org/mailman/listinfo/nginx
>>>
>>>
>>>
>>> --
>>> *Anoop P Alias*
>>>
>>>
>>>
>>> _______________________________________________
>>> nginx mailing listnginx at nginx.orghttp://mailman.nginx.org/mailman/listinfo/nginx
>>>
>>>
>>> _______________________________________________
>>> nginx mailing list
>>> nginx at nginx.org
>>> http://mailman.nginx.org/mailman/listinfo/nginx
>>
>>
>>
>> --
>> *Anoop P Alias*
>>
>>
>>
>> _______________________________________________
>> nginx mailing listnginx at nginx.orghttp://mailman.nginx.org/mailman/listinfo/nginx
>>
>>
>> _______________________________________________
>> nginx mailing list
>> nginx at nginx.org
>> http://mailman.nginx.org/mailman/listinfo/nginx
>
>
>
> --
> *Anoop P Alias*
>
>
--
*Anoop P Alias*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20180810/1e41ea0d/attachment-0001.html>
More information about the nginx
mailing list