posix_memalign error

Anoop Alias anoopalias01 at gmail.com
Fri Aug 10 06:42:20 UTC 2018


I may have found the root cause of this issue. Many thanks to Igor for the
valuable inputs

The issue had something to do with the fact that I was calling nginx -s
reload from python subprocess module

and I believe the error was coming from the fork in python


https://stackoverflow.com/questions/1367373/python-subprocess-popen-oserror-errno-12-cannot-allocate-memory
https://stackoverflow.com/questions/5306075/python-memory-allocation-error-using-subprocess-popen

As Igor suggested I changed the subprocess call in python to signal (SIGHUP
to master binary) and the logs don't have the ENOMEM error anymore, at
least in the past 12+ hours

The memory usage for ~10k virtual hosts are a bit high though, but things
do work as expected and no more errors

Thank you all again

On Wed, Aug 8, 2018 at 9:03 AM Anoop Alias <anoopalias01 at gmail.com> wrote:

> Hi Igor,
>
> Yes the server runs other software including httpd with a similar number
> of vhost
>
> # grep "<VirtualHost" /etc/apache2/conf/httpd.conf|wc -l
> 5168
>
> I haven't found issue with the other softwares in the logs relating to
> memory
>
> Infact httpd (event mpm) use lesser memory to load similar config
>
> # ps_mem| head -1 && ps_mem |grep httpd
>  Private  +   Shared  =  RAM used       Program
> 585.6 MiB + 392.0 MiB = 977.6 MiB       httpd (63)
>
> # ps_mem| head -1 && ps_mem |grep nginx
>  Private  +   Shared  =  RAM used       Program
> 999.8 MiB +   2.5 GiB =   3.5 GiB       nginx (3)
>
> The server is a shared hosting one and runs CloudLinux , but as far as I
> know ,CloudLinux applies limits to only user level process and not nginx
>
> The nginx HUP is needed as this is triggered by changes in apache
> configuration and nginx need to reload the new config . For log file reload
> SIGUSR1 is used
>
>
>
>
>
> On Tue, Aug 7, 2018 at 5:50 PM Igor A. Ippolitov <iippolitov at nginx.com>
> wrote:
>
>> Anoop,
>>
>> I don't see any troubles with your configuration.
>> Also, if you have 120G of RAM and a single worker - the problem is not in
>> nginx.
>> Do you have other software running on the host?
>>
>> Basically, you just run out of memory.
>>
>> You can optimize your reload though: use "service nginx reload" (or "kill
>> -SIGHUP") to reload nginx configuration.
>> When you do nginx -s reload - you make nginx parse configuration (and it
>> requires memory) and then send a signal to the running master. You can
>> avoid this overhead with 'service' command as it uses 'kill' documented in
>> the manual page.
>>
>> On 06.08.2018 22:55, Anoop Alias wrote:
>>
>> Hi Igor,
>>
>> Config is reloaded using
>>
>> /usr/sbin/nginx -s reload
>>
>> this is invoked from a python/shell script ( Nginx is installed on a web
>> control panel )
>>
>> The top-level Nginx config is in the gist below
>>
>> https://gist.github.com/AnoopAlias/ba5ad6749a586c7e267672ee65b32b3a
>>
>> It further includes ~8k server blocks or more in some servers. Out of
>> this 2/3 are server {} blocks with TLS config and 1/3 non-TLS ones
>>
>> ]# pwd
>> /etc/nginx/sites-enabled
>> # grep "server {" *|wc -l
>> 7886
>>
>> And yes most of them are very similar and mostly proxy to upstream httpd
>>
>> I have tried removing all the loadable modules and even tried an older
>> version of nginx and all produce the error
>>
>>
>> # numastat -m
>>
>> Per-node system memory usage (in MBs):
>>                           Node 0          Node 1           Total
>>                  --------------- --------------- ---------------
>> MemTotal                65430.84        65536.00       130966.84
>> MemFree                  5491.26           40.89         5532.15
>> MemUsed                 59939.58        65495.11       125434.69
>> Active                  22295.61        21016.09        43311.70
>> Inactive                 8742.76         4662.48        13405.24
>> Active(anon)            16717.10        16572.19        33289.29
>> Inactive(anon)           2931.94         1388.14         4320.08
>> Active(file)             5578.50         4443.91        10022.41
>> Inactive(file)           5810.82         3274.34         9085.16
>> Unevictable                 0.00            0.00            0.00
>> Mlocked                     0.00            0.00            0.00
>> Dirty                       7.04            1.64            8.67
>> Writeback                   0.00            0.00            0.00
>> FilePages               18458.93        10413.97        28872.90
>> Mapped                    862.14          413.38         1275.52
>> AnonPages               12579.49        15264.37        27843.86
>> Shmem                    7069.52         2695.71         9765.23
>> KernelStack                18.34            3.03           21.38
>> PageTables                153.14          107.77          260.90
>> NFS_Unstable                0.00            0.00            0.00
>> Bounce                      0.00            0.00            0.00
>> WritebackTmp                0.00            0.00            0.00
>> Slab                     4830.68         2254.55         7085.22
>> SReclaimable             2061.05          921.72         2982.77
>> SUnreclaim               2769.62         1332.83         4102.45
>> AnonHugePages               4.00            2.00            6.00
>> HugePages_Total             0.00            0.00            0.00
>> HugePages_Free              0.00            0.00            0.00
>> HugePages_Surp              0.00            0.00            0.00
>>
>>
>> Thanks,
>>
>>
>>
>>
>>
>> On Mon, Aug 6, 2018 at 6:33 PM Igor A. Ippolitov <iippolitov at nginx.com>
>> wrote:
>>
>>> Anoop,
>>>
>>> I suppose, most of your 10k servers are very similar, right?
>>> Please, post top level configuration and a typical server{}, please.
>>>
>>> Also, how do you reload configuration? With 'service nginx reload' or
>>> may be other commands?
>>>
>>> It looks like you have a lot of fragmented memory and only 4gb free in
>>> the second numa node.
>>> So, I'd say this is OK that you are getting errors from allocating a 16k
>>> stripes.
>>>
>>> Could you please post numastat -m output additionally. Just to make sure
>>> you have half of the memory for the second CPU.
>>> And we'll have a look if memory utilization may be optimized based on
>>> your configuration.
>>>
>>> Regards,
>>> Igor.
>>>
>>> On 04.08.2018 07:54, Anoop Alias wrote:
>>>
>>> Hi Igor,
>>>
>>> Setting vm.max_map_count to 20x the normal value did not help
>>>
>>> The issue happens on a group of servers and among the group, it shows up
>>> only in servers which have ~10k  server{} blocks
>>>
>>> On servers that have lower number of server{} blocks , the ENOMEM issue
>>> is not there
>>>
>>> Also, I can find that the RAM usage of the Nginx process is directly
>>> proportional to the number of server {} blocks
>>>
>>> For example on a server having the problem
>>>
>>> # ps_mem| head -1 && ps_mem |grep nginx
>>>  Private  +   Shared  =  RAM used       Program
>>>   1.0 GiB +   2.8 GiB =   3.8 GiB       nginx (3)
>>>
>>>
>>> That is for a single worker process with 4 threads in thread_pool
>>> # pstree|grep nginx
>>>         |-nginx-+-nginx---4*[{nginx}]
>>>         |       `-nginx
>>>
>>> Whatever config change I try the memory usage seem to mostly depend on
>>> the number of server contexts defined
>>>
>>> Now the issue mostly happen in nginx reload ,when one more worker
>>> process will be active in shutting down mode
>>>
>>> I believe the memalign error is thrown by the worker being shutdown,
>>> this is because the sites work after the error and also the pid mentioned
>>> in the error would have gone when I check ps
>>>
>>>
>>> # pmap 948965|grep 16K
>>> 00007f2923ff2000     16K r-x-- ngx_http_redis2_module.so
>>> 00007f2924fd7000     16K r---- libc-2.17.so
>>> 00007f2925431000     16K rw---   [ anon ]
>>> 00007f292584a000     16K rw---   [ anon ]
>>>
>>> Aug  4 05:50:00 b kernel: SysRq : Show Memory
>>> Aug  4 05:50:00 b kernel: Mem-Info:
>>> Aug  4 05:50:00 b kernel: active_anon:7757394 inactive_anon:1021319
>>> isolated_anon:0#012 active_file:3733324 inactive_file:2136476 isolated_
>>> file:0#012 unevictable:0 dirty:1766 writeback:6 wbtmp:0 unstable:0#012
>>> slab_reclaimable:2003687 slab_unreclaimable:901391#012 mapped:316734
>>> shmem:2381810 pagetables:63163 bounce:0#012 free:4851283 free_pcp:11332
>>> free_cma:0
>>> Aug  4 05:50:00 bravo kernel: Node 0 DMA free:15888kB min:8kB low:8kB
>>> high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_
>>> file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
>>> present:15972kB managed:15888kB mlocked:0kB dirty:0kB writeback:0kB
>>> mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB
>>> kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB
>>> local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
>>> all_unreclaimable? yes
>>> Aug  4 05:50:00 b kernel: lowmem_reserve[]: 0 1679 64139 64139
>>>
>>> # cat /proc/buddyinfo
>>> Node 0, zone      DMA      0      0      1      0      2      1      1
>>>     0      1      1      3
>>> Node 0, zone    DMA32   5284   6753   6677   1083    410     59      1
>>>     0      0      0      0
>>> Node 0, zone   Normal 500327 638958 406737  14690    872    106     11
>>>     0      0      0      0
>>> Node 1, zone   Normal 584840 291640    188      0      0      0      0
>>>     0      0      0      0
>>>
>>>
>>> The only correlation I see in having the error is the number of
>>> server {} blocks (close to 10k) which then makes the nginx process consume
>>> ~ 4GB of mem with a single worker process and then a reload is done
>>>
>>>
>>>
>>>
>>> On Thu, Aug 2, 2018 at 6:02 PM Igor A. Ippolitov <iippolitov at nginx.com>
>>> wrote:
>>>
>>>> Anoop,
>>>>
>>>> There are two guesses: either mmap allocations limit is hit or memory
>>>> is  way too fragmented.
>>>> Could you please track amount of mapped regions for a worker with pmap
>>>> and amount of 16k areas in Normal zones (it is the third number)?
>>>>
>>>> You can also set vm.max_map_count to a higher number (like 20 times
>>>> higher than default) and look if the error is gone.
>>>>
>>>> Please, let me know if increasing vm.max_map_count helps you.
>>>>
>>>> On 02.08.2018 13:06, Anoop Alias wrote:
>>>>
>>>> Hi Igor,
>>>>
>>>> The error happens randomly
>>>>
>>>> 2018/08/02 06:52:42 [emerg] 874514#874514: posix_memalign(16, 16384)
>>>> failed (12: Cannot allocate memory)
>>>> 2018/08/02 09:42:53 [emerg] 872996#872996: posix_memalign(16, 16384)
>>>> failed (12: Cannot allocate memory)
>>>> 2018/08/02 10:16:14 [emerg] 877611#877611: posix_memalign(16, 16384)
>>>> failed (12: Cannot allocate memory)
>>>> 2018/08/02 10:16:48 [emerg] 879410#879410: posix_memalign(16, 16384)
>>>> failed (12: Cannot allocate memory)
>>>> 2018/08/02 10:17:55 [emerg] 876563#876563: posix_memalign(16, 16384)
>>>> failed (12: Cannot allocate memory)
>>>> 2018/08/02 10:20:21 [emerg] 879263#879263: posix_memalign(16, 16384)
>>>> failed (12: Cannot allocate memory)
>>>> 2018/08/02 10:20:51 [emerg] 878991#878991: posix_memalign(16, 16384)
>>>> failed (12: Cannot allocate memory)
>>>>
>>>> # date
>>>> Thu Aug  2 10:58:48 BST 2018
>>>>
>>>> ------------------------------------------
>>>> # cat /proc/buddyinfo
>>>> Node 0, zone      DMA      0      0      1      0      2      1      1
>>>>     0      1      1      3
>>>> Node 0, zone    DMA32  11722  11057   4663   1647    609     72     10
>>>>     7      1      0      0
>>>> Node 0, zone   Normal 755026 710760 398136  21462   1114     18      1
>>>>     0      0      0      0
>>>> Node 1, zone   Normal 341295 801810 179604    256      0      0      0
>>>>     0      0      0      0
>>>> -----------------------------------------
>>>>
>>>>
>>>> slabinfo - version: 2.1
>>>> # name            <active_objs> <num_objs> <objsize> <objperslab>
>>>> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata
>>>> <active_slabs> <num_slabs> <sharedavail>
>>>> SCTPv6                21     21   1536   21    8 : tunables    0    0
>>>>   0 : slabdata      1      1      0
>>>> SCTP                   0      0   1408   23    8 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> kcopyd_job             0      0   3312    9    8 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> dm_uevent              0      0   2608   12    8 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> nf_conntrack_ffffffff81acbb00  14054  14892    320   51    4 :
>>>> tunables    0    0    0 : slabdata    292    292      0
>>>> lvp_cache             36     36    224   36    2 : tunables    0    0
>>>>   0 : slabdata      1      1      0
>>>> lve_struct          4140   4140    352   46    4 : tunables    0    0
>>>>   0 : slabdata     90     90      0
>>>> fat_inode_cache        0      0    744   44    8 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> fat_cache              0      0     40  102    1 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> isofs_inode_cache      0      0    664   49    8 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> ext4_inode_cache      30     30   1088   30    8 : tunables    0    0
>>>>   0 : slabdata      1      1      0
>>>> ext4_xattr             0      0     88   46    1 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> ext4_free_data         0      0     64   64    1 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> ext4_allocation_context     32     32    128   32    1 : tunables    0
>>>>   0    0 : slabdata      1      1      0
>>>> ext4_io_end            0      0     72   56    1 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> ext4_extent_status    102    102     40  102    1 : tunables    0    0
>>>>   0 : slabdata      1      1      0
>>>> jbd2_journal_handle      0      0     48   85    1 : tunables    0
>>>> 0    0 : slabdata      0      0      0
>>>> jbd2_journal_head      0      0    112   36    1 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> jbd2_revoke_table_s    256    256     16  256    1 : tunables    0
>>>> 0    0 : slabdata      1      1      0
>>>> jbd2_revoke_record_s      0      0     32  128    1 : tunables    0
>>>> 0    0 : slabdata      0      0      0
>>>> kvm_async_pf           0      0    136   30    1 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> kvm_vcpu               0      0  18560    1    8 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> xfs_dqtrx            992    992    528   31    4 : tunables    0    0
>>>>   0 : slabdata     32     32      0
>>>> xfs_dquot           3264   3264    472   34    4 : tunables    0    0
>>>>   0 : slabdata     96     96      0
>>>> xfs_ili           4342175 4774399    152   53    2 : tunables    0
>>>> 0    0 : slabdata  90083  90083      0
>>>> xfs_inode         4915588 5486076   1088   30    8 : tunables    0
>>>> 0    0 : slabdata 182871 182871      0
>>>> xfs_efd_item        2680   2760    400   40    4 : tunables    0    0
>>>>   0 : slabdata     69     69      0
>>>> xfs_da_state        1088   1088    480   34    4 : tunables    0    0
>>>>   0 : slabdata     32     32      0
>>>> xfs_btree_cur       1248   1248    208   39    2 : tunables    0    0
>>>>   0 : slabdata     32     32      0
>>>> xfs_log_ticket     14874  15048    184   44    2 : tunables    0    0
>>>>   0 : slabdata    342    342      0
>>>> xfs_ioend          12909  13104    104   39    1 : tunables    0    0
>>>>   0 : slabdata    336    336      0
>>>> scsi_cmd_cache      5400   5652    448   36    4 : tunables    0    0
>>>>   0 : slabdata    157    157      0
>>>> ve_struct              0      0    848   38    8 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> ip6_dst_cache       1152   1152    448   36    4 : tunables    0    0
>>>>   0 : slabdata     32     32      0
>>>> RAWv6                910    910   1216   26    8 : tunables    0    0
>>>>   0 : slabdata     35     35      0
>>>> UDPLITEv6              0      0   1216   26    8 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> UDPv6                832    832   1216   26    8 : tunables    0    0
>>>>   0 : slabdata     32     32      0
>>>> tw_sock_TCPv6       1152   1376    256   32    2 : tunables    0    0
>>>>   0 : slabdata     43     43      0
>>>> TCPv6                510    510   2176   15    8 : tunables    0    0
>>>>   0 : slabdata     34     34      0
>>>> cfq_queue           3698   5145    232   35    2 : tunables    0    0
>>>>   0 : slabdata    147    147      0
>>>> bsg_cmd                0      0    312   52    4 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> mqueue_inode_cache    136    136    960   34    8 : tunables    0    0
>>>>   0 : slabdata      4      4      0
>>>> hugetlbfs_inode_cache   1632   1632    632   51    8 : tunables    0
>>>> 0    0 : slabdata     32     32      0
>>>> configfs_dir_cache   1472   1472     88   46    1 : tunables    0    0
>>>>   0 : slabdata     32     32      0
>>>> dquot                  0      0    256   32    2 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> userfaultfd_ctx_cache     32     32    128   32    1 : tunables    0
>>>> 0    0 : slabdata      1      1      0
>>>> fanotify_event_info   2336   2336     56   73    1 : tunables    0
>>>> 0    0 : slabdata     32     32      0
>>>> dio                 6171   6222    640   51    8 : tunables    0    0
>>>>   0 : slabdata    122    122      0
>>>> pid_namespace         42     42   2192   14    8 : tunables    0    0
>>>>   0 : slabdata      3      3      0
>>>> posix_timers_cache   1056   1056    248   33    2 : tunables    0    0
>>>>   0 : slabdata     32     32      0
>>>> UDP-Lite               0      0   1088   30    8 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> flow_cache          2268   2296    144   28    1 : tunables    0    0
>>>>   0 : slabdata     82     82      0
>>>> xfrm_dst_cache       896    896    576   28    4 : tunables    0    0
>>>>   0 : slabdata     32     32      0
>>>> ip_fib_alias        2720   2720     48   85    1 : tunables    0    0
>>>>   0 : slabdata     32     32      0
>>>> RAW                 3977   4224   1024   32    8 : tunables    0    0
>>>>   0 : slabdata    132    132      0
>>>> UDP                 4110   4110   1088   30    8 : tunables    0    0
>>>>   0 : slabdata    137    137      0
>>>> tw_sock_TCP         4756   5216    256   32    2 : tunables    0    0
>>>>   0 : slabdata    163    163      0
>>>> TCP                 2705   2768   1984   16    8 : tunables    0    0
>>>>   0 : slabdata    173    173      0
>>>> scsi_data_buffer    5440   5440     24  170    1 : tunables    0    0
>>>>   0 : slabdata     32     32      0
>>>> blkdev_queue         154    154   2208   14    8 : tunables    0    0
>>>>   0 : slabdata     11     11      0
>>>> blkdev_requests   4397688 4405884    384   42    4 : tunables    0
>>>> 0    0 : slabdata 104902 104902      0
>>>> blkdev_ioc         11232  11232    112   36    1 : tunables    0    0
>>>>   0 : slabdata    312    312      0
>>>> user_namespace         0      0   1304   25    8 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> sock_inode_cache   12282  12282    704   46    8 : tunables    0    0
>>>>   0 : slabdata    267    267      0
>>>> file_lock_cache    20056  20960    200   40    2 : tunables    0    0
>>>>   0 : slabdata    524    524      0
>>>> net_namespace          6      6   5056    6    8 : tunables    0    0
>>>>   0 : slabdata      1      1      0
>>>> shmem_inode_cache  16970  18952    712   46    8 : tunables    0    0
>>>>   0 : slabdata    412    412      0
>>>> Acpi-ParseExt      39491  40432     72   56    1 : tunables    0    0
>>>>   0 : slabdata    722    722      0
>>>> Acpi-State          1683   1683     80   51    1 : tunables    0    0
>>>>   0 : slabdata     33     33      0
>>>> Acpi-Namespace     11424  11424     40  102    1 : tunables    0    0
>>>>   0 : slabdata    112    112      0
>>>> task_delay_info    15336  15336    112   36    1 : tunables    0    0
>>>>   0 : slabdata    426    426      0
>>>> taskstats           1568   1568    328   49    4 : tunables    0    0
>>>>   0 : slabdata     32     32      0
>>>> proc_inode_cache  169897 190608    680   48    8 : tunables    0    0
>>>>   0 : slabdata   3971   3971      0
>>>> sigqueue            2208   2208    168   48    2 : tunables    0    0
>>>>   0 : slabdata     46     46      0
>>>> bdev_cache           792    792    896   36    8 : tunables    0    0
>>>>   0 : slabdata     22     22      0
>>>> sysfs_dir_cache    74698  74698    120   34    1 : tunables    0    0
>>>>   0 : slabdata   2197   2197      0
>>>> mnt_cache         163197 163424    256   32    2 : tunables    0    0
>>>>   0 : slabdata   5107   5107      0
>>>> filp               64607  97257    320   51    4 : tunables    0    0
>>>>   0 : slabdata   1907   1907      0
>>>> inode_cache       370744 370947    616   53    8 : tunables    0    0
>>>>   0 : slabdata   6999   6999      0
>>>> dentry            1316262 2139228    192   42    2 : tunables    0
>>>> 0    0 : slabdata  50934  50934      0
>>>> iint_cache             0      0     80   51    1 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> buffer_head       1441470 2890290    104   39    1 : tunables    0
>>>> 0    0 : slabdata  74110  74110      0
>>>> vm_area_struct    194998 196840    216   37    2 : tunables    0    0
>>>>   0 : slabdata   5320   5320      0
>>>> mm_struct           2679   2760   1600   20    8 : tunables    0    0
>>>>   0 : slabdata    138    138      0
>>>> files_cache         8680   8925    640   51    8 : tunables    0    0
>>>>   0 : slabdata    175    175      0
>>>> signal_cache        3691   3780   1152   28    8 : tunables    0    0
>>>>   0 : slabdata    135    135      0
>>>> sighand_cache       1950   2160   2112   15    8 : tunables    0    0
>>>>   0 : slabdata    144    144      0
>>>> task_xstate         8070   8658    832   39    8 : tunables    0    0
>>>>   0 : slabdata    222    222      0
>>>> task_struct         1913   2088   4080    8    8 : tunables    0    0
>>>>   0 : slabdata    261    261      0
>>>> cred_jar           31699  33936    192   42    2 : tunables    0    0
>>>>   0 : slabdata    808    808      0
>>>> anon_vma_chain    164026 168704     64   64    1 : tunables    0    0
>>>>   0 : slabdata   2636   2636      0
>>>> anon_vma           84104  84594     88   46    1 : tunables    0    0
>>>>   0 : slabdata   1839   1839      0
>>>> pid                11127  12576    128   32    1 : tunables    0    0
>>>>   0 : slabdata    393    393      0
>>>> shared_policy_node   9350   9350     48   85    1 : tunables    0    0
>>>>   0 : slabdata    110    110      0
>>>> numa_policy           62     62    264   31    2 : tunables    0    0
>>>>   0 : slabdata      2      2      0
>>>> radix_tree_node   771778 1194312    584   28    4 : tunables    0    0
>>>>   0 : slabdata  42654  42654      0
>>>> idr_layer_cache     2538   2565   2112   15    8 : tunables    0    0
>>>>   0 : slabdata    171    171      0
>>>> dma-kmalloc-8192       0      0   8192    4    8 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> dma-kmalloc-4096       0      0   4096    8    8 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> dma-kmalloc-2048       0      0   2048   16    8 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> dma-kmalloc-1024       0      0   1024   32    8 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> dma-kmalloc-512        0      0    512   32    4 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> dma-kmalloc-256        0      0    256   32    2 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> dma-kmalloc-128        0      0    128   32    1 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> dma-kmalloc-64         0      0     64   64    1 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> dma-kmalloc-32         0      0     32  128    1 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> dma-kmalloc-16         0      0     16  256    1 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> dma-kmalloc-8          0      0      8  512    1 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> dma-kmalloc-192        0      0    192   42    2 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> dma-kmalloc-96         0      0     96   42    1 : tunables    0    0
>>>>   0 : slabdata      0      0      0
>>>> kmalloc-8192         385    388   8192    4    8 : tunables    0    0
>>>>   0 : slabdata     97     97      0
>>>> kmalloc-4096        9296  10088   4096    8    8 : tunables    0    0
>>>>   0 : slabdata   1261   1261      0
>>>> kmalloc-2048       65061 133536   2048   16    8 : tunables    0    0
>>>>   0 : slabdata   8346   8346      0
>>>> kmalloc-1024       11987  21120   1024   32    8 : tunables    0    0
>>>>   0 : slabdata    660    660      0
>>>> kmalloc-512       107510 187072    512   32    4 : tunables    0    0
>>>>   0 : slabdata   5846   5846      0
>>>> kmalloc-256       160498 199104    256   32    2 : tunables    0    0
>>>>   0 : slabdata   6222   6222      0
>>>> kmalloc-192       144975 237426    192   42    2 : tunables    0    0
>>>>   0 : slabdata   5653   5653      0
>>>> kmalloc-128        36799 108096    128   32    1 : tunables    0    0
>>>>   0 : slabdata   3378   3378      0
>>>> kmalloc-96         99510 238896     96   42    1 : tunables    0    0
>>>>   0 : slabdata   5688   5688      0
>>>> kmalloc-64        7978152 8593280     64   64    1 : tunables    0
>>>> 0    0 : slabdata 134270 134270      0
>>>> kmalloc-32        2939882 3089664     32  128    1 : tunables    0
>>>> 0    0 : slabdata  24138  24138      0
>>>> kmalloc-16        172057 172288     16  256    1 : tunables    0    0
>>>>   0 : slabdata    673    673      0
>>>> kmalloc-8         109568 109568      8  512    1 : tunables    0    0
>>>>   0 : slabdata    214    214      0
>>>> kmem_cache_node      893    896     64   64    1 : tunables    0    0
>>>>   0 : slabdata     14     14      0
>>>> kmem_cache           612    612    320   51    4 : tunables    0    0
>>>>   0 : slabdata     12     12      0
>>>>
>>>> -------------------------------------------------
>>>>
>>>>
>>>> # uname -r
>>>> 3.10.0-714.10.2.lve1.5.17.1.el7.x86_64
>>>>
>>>> --------------------------------------------------------
>>>>
>>>> Core part of glances
>>>> http://i.imgur.com/La5JbQn.png
>>>> -----------------------------------------------------------
>>>>
>>>> Thank you very much for looking into this
>>>>
>>>>
>>>> On Thu, Aug 2, 2018 at 12:37 PM Igor A. Ippolitov <iippolitov at nginx.com>
>>>> wrote:
>>>>
>>>>> Anoop,
>>>>>
>>>>> I doubt this will be the solution, but may we have a look at
>>>>> /proc/buddyinfo and /proc/slabinfo the moment when nginx can't allocate
>>>>> memory?
>>>>>
>>>>> On 02.08.2018 08:15, Anoop Alias wrote:
>>>>>
>>>>> Hi Maxim,
>>>>>
>>>>> I enabled debug and the memalign call is happening on nginx reloads
>>>>> and the ENOMEM happen sometimes on the reload(not on all reloads)
>>>>>
>>>>> 2018/08/02 05:59:08 [notice] 872052#872052: signal process started
>>>>> 2018/08/02 05:59:23 [notice] 871570#871570: signal 1 (SIGHUP) received
>>>>> from 872052, reconfiguring
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: wake up, sigio 0
>>>>> 2018/08/02 05:59:23 [notice] 871570#871570: reconfiguring
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 0000000002B0DA00:16384 @16      === > the memalign call on reload
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: malloc:
>>>>> 00000000087924D0:4560
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 000000000E442E00:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: malloc:
>>>>> 0000000005650850:4096
>>>>> 20
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: bind() xxxx:443 #71
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: bind() xxxx:443 #72
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: bind() xxxx:443 #73
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: bind() xxxx:443 #74
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: add cleanup:
>>>>> 000000005340D728
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000024D3260:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000517BAF10:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 0000000053854FC0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 0000000053855FD0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 0000000053856FE0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 0000000053857FF0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: posix_memalign:
>>>>> 0000000053859000:16384 @16
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 000000005385D010:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 000000005385E020:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 000000005385F030:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536CD160:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536CE170:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536CF180:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D0190:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D11A0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D21B0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D31C0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D41D0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D51E0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D61F0:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D7200:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D8210:4096
>>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc:
>>>>> 00000000536D9220:4096
>>>>>
>>>>>
>>>>> Infact there are lot of such calls during a reload
>>>>>
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA17ED00:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA1B0FF0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA1E12C0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA211590:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA243880:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA271B30:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA2A3E20:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA2D20D0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA3063E0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA334690:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA366980:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA396C50:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA3C8F40:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA3F9210:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA4294E0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA45B7D0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA489A80:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA4BBD70:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA4EA020:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA51E330:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA54C5E0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA57E8D0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA5AEBA0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA5DEE70:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA611160:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA641430:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA671700:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA6A29E0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA6D5CE0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA707FD0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA736280:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA768570:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA796820:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA7CAB30:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA7F8DE0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA82B0D0:16384 @16
>>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign:
>>>>> 00000000BA85B3A0:16384 @16
>>>>>
>>>>>
>>>>>
>>>>> What is perplexing is that the system has enough free (available RAM)
>>>>> #############
>>>>> # free -g
>>>>>               total        used        free      shared  buff/cache
>>>>>  available
>>>>> Mem:            125          54          24           8          46
>>>>>       58
>>>>> Swap:             0           0           0
>>>>> #############
>>>>>
>>>>> # ulimit -a
>>>>> core file size          (blocks, -c) 0
>>>>> data seg size           (kbytes, -d) unlimited
>>>>> scheduling priority             (-e) 0
>>>>> file size               (blocks, -f) unlimited
>>>>> pending signals                 (-i) 514579
>>>>> max locked memory       (kbytes, -l) 64
>>>>> max memory size         (kbytes, -m) unlimited
>>>>> open files                      (-n) 1024
>>>>> pipe size            (512 bytes, -p) 8
>>>>> POSIX message queues     (bytes, -q) 819200
>>>>> real-time priority              (-r) 0
>>>>> stack size              (kbytes, -s) 8192
>>>>> cpu time               (seconds, -t) unlimited
>>>>> max user processes              (-u) 514579
>>>>> virtual memory          (kbytes, -v) unlimited
>>>>> file locks                      (-x) unlimited
>>>>>
>>>>> #########################################
>>>>>
>>>>> There is no other thing limiting memory allocation
>>>>>
>>>>> Any way to prevent this or probably identify/prevent this
>>>>>
>>>>>
>>>>> On Tue, Jul 31, 2018 at 7:08 PM Maxim Dounin <mdounin at mdounin.ru>
>>>>> wrote:
>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> On Tue, Jul 31, 2018 at 09:52:29AM +0530, Anoop Alias wrote:
>>>>>>
>>>>>> > I am repeatedly seeing errors like
>>>>>> >
>>>>>> > ######################
>>>>>> > 2018/07/31 03:46:33 [emerg] 2854560#2854560: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > 2018/07/31 03:54:09 [emerg] 2890190#2890190: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > 2018/07/31 04:08:36 [emerg] 2939230#2939230: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > 2018/07/31 04:24:48 [emerg] 2992650#2992650: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > 2018/07/31 04:42:09 [emerg] 3053092#3053092: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > 2018/07/31 04:42:17 [emerg] 3053335#3053335: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > 2018/07/31 04:42:28 [emerg] 3053937#3053937: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > 2018/07/31 04:47:54 [emerg] 3070638#3070638: posix_memalign(16,
>>>>>> 16384)
>>>>>> > failed (12: Cannot allocate memory)
>>>>>> > ####################
>>>>>> >
>>>>>> > on a few servers
>>>>>> >
>>>>>> > The servers have enough memory free and the swap usage is 0, yet
>>>>>> somehow
>>>>>> > the kernel denies the posix_memalign with ENOMEM ( this is what I
>>>>>> think is
>>>>>> > happening!)
>>>>>> >
>>>>>> > The numbers requested are always 16, 16k . This makes me suspicious
>>>>>> >
>>>>>> > I have no setting in nginx.conf that reference a 16k
>>>>>> >
>>>>>> > Is there any chance of finding out what requests this and why this
>>>>>> is not
>>>>>> > fulfilled
>>>>>>
>>>>>> There are at least some buffers which default to 16k - for
>>>>>> example, ssl_buffer_size (http://nginx.org/r/ssl_buffer_size).
>>>>>>
>>>>>> You may try debugging log to futher find out where the particular
>>>>>> allocation happens, see here for details:
>>>>>>
>>>>>> http://nginx.org/en/docs/debugging_log.html
>>>>>>
>>>>>> But I don't really think it worth the effort.  The error is pretty
>>>>>> clear, and it's better to focus on why these allocations are
>>>>>> denied.  Likely you are hitting some limit.
>>>>>>
>>>>>> --
>>>>>> Maxim Dounin
>>>>>> http://mdounin.ru/
>>>>>> _______________________________________________
>>>>>> nginx mailing list
>>>>>> nginx at nginx.org
>>>>>> http://mailman.nginx.org/mailman/listinfo/nginx
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Anoop P Alias*
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> nginx mailing listnginx at nginx.orghttp://mailman.nginx.org/mailman/listinfo/nginx
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> nginx mailing list
>>>>> nginx at nginx.org
>>>>> http://mailman.nginx.org/mailman/listinfo/nginx
>>>>
>>>>
>>>>
>>>> --
>>>> *Anoop P Alias*
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> nginx mailing listnginx at nginx.orghttp://mailman.nginx.org/mailman/listinfo/nginx
>>>>
>>>>
>>>> _______________________________________________
>>>> nginx mailing list
>>>> nginx at nginx.org
>>>> http://mailman.nginx.org/mailman/listinfo/nginx
>>>
>>>
>>>
>>> --
>>> *Anoop P Alias*
>>>
>>>
>>>
>>> _______________________________________________
>>> nginx mailing listnginx at nginx.orghttp://mailman.nginx.org/mailman/listinfo/nginx
>>>
>>>
>>> _______________________________________________
>>> nginx mailing list
>>> nginx at nginx.org
>>> http://mailman.nginx.org/mailman/listinfo/nginx
>>
>>
>>
>> --
>> *Anoop P Alias*
>>
>>
>>
>> _______________________________________________
>> nginx mailing listnginx at nginx.orghttp://mailman.nginx.org/mailman/listinfo/nginx
>>
>>
>> _______________________________________________
>> nginx mailing list
>> nginx at nginx.org
>> http://mailman.nginx.org/mailman/listinfo/nginx
>
>
>
> --
> *Anoop P Alias*
>
>

-- 
*Anoop P Alias*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20180810/1e41ea0d/attachment-0001.html>


More information about the nginx mailing list