Saturday, June 14, 2014

Suricata IDS/IPS - TCP segment pool size preallocation


In the default suricata.yaml stream section we have:
stream:
  memcap: 32mb
  checksum-validation: no      # reject wrong csums
  async-oneside: true
  midstream: true
  inline: no                  # auto will use inline mode in IPS mode, yes or no set it statically
  reassembly:
    memcap: 64mb
    depth: 1mb                  # reassemble 1mb into a stream
    toserver-chunk-size: 2560
    toclient-chunk-size: 2560
    randomize-chunk-size: yes
    #randomize-chunk-range: 10
    #raw: yes
    #chunk-prealloc: 250
    #segments:
    #  - size: 4
    #    prealloc: 256
    #  - size: 16
    #    prealloc: 512
    #  - size: 112
    #    prealloc: 512
    #  - size: 248
    #    prealloc: 512
    #  - size: 512
    #    prealloc: 512
    #  - size: 768
    #    prealloc: 1024
    #  - size: 1448
    #    prealloc: 1024
    #  - size: 65535
    #    prealloc: 128


So what are these segment preallocations for?
Let's have a look. When Suricata exits (or kill -15 PidOfSuricata) it produces a lot of useful statistics in the suricata.log file (you can enable that from the suricata.yaml and use the "-v" switch (verbose) when starting Suricata):
The example below is for exit stats.
   
tail -20 StatsByDate/suricata-2014-06-01.log
[24344] 1/6/2014 -- 01:45:52 - (source-af-packet.c:1810) <Info> (ReceiveAFPThreadExitStats) -- (AFPacketeth314) Packets 7317661624, bytes 6132661347126
[24344] 1/6/2014 -- 01:45:52 - (stream-tcp.c:4643) <Info> (StreamTcpExitPrintStats) -- Stream TCP processed 3382528539 TCP packets
[24345] 1/6/2014 -- 01:45:52 - (source-af-packet.c:1807) <Info> (ReceiveAFPThreadExitStats) -- (AFPacketeth315) Kernel: Packets 8049357450, dropped 352658715
[24345] 1/6/2014 -- 01:45:52 - (source-af-packet.c:1810) <Info> (ReceiveAFPThreadExitStats) -- (AFPacketeth315) Packets 7696486934, bytes 6666577738944
[24345] 1/6/2014 -- 01:45:52 - (stream-tcp.c:4643) <Info> (StreamTcpExitPrintStats) -- Stream TCP processed 3357321803 TCP packets
[24346] 1/6/2014 -- 01:45:52 - (source-af-packet.c:1807) <Info> (ReceiveAFPThreadExitStats) -- (AFPacketeth316) Kernel: Packets 7573051188, dropped 292897219
[24346] 1/6/2014 -- 01:45:52 - (source-af-packet.c:1810) <Info> (ReceiveAFPThreadExitStats) -- (AFPacketeth316) Packets 7279948375, bytes 6046562324948
[24346] 1/6/2014 -- 01:45:52 - (stream-tcp.c:4643) <Info> (StreamTcpExitPrintStats) -- Stream TCP processed 3454330660 TCP packets
[24329] 1/6/2014 -- 01:45:53 - (stream-tcp-reassemble.c:502) <Info> (StreamTcpReassembleFree) -- TCP segment pool of size 4 had a peak use of 60778 segments, more than the prealloc setting of 256
[24329] 1/6/2014 -- 01:45:53 - (stream-tcp-reassemble.c:502) <Info> (StreamTcpReassembleFree) -- TCP segment pool of size 16 had a peak use of 314953 segments, more than the prealloc setting of 512
[24329] 1/6/2014 -- 01:45:53 - (stream-tcp-reassemble.c:502) <Info> (StreamTcpReassembleFree) -- TCP segment pool of size 112 had a peak use of 113739 segments, more than the prealloc setting of 512
[24329] 1/6/2014 -- 01:45:53 - (stream-tcp-reassemble.c:502) <Info> (StreamTcpReassembleFree) -- TCP segment pool of size 248 had a peak use of 17893 segments, more than the prealloc setting of 512
[24329] 1/6/2014 -- 01:45:53 - (stream-tcp-reassemble.c:502) <Info> (StreamTcpReassembleFree) -- TCP segment pool of size 512 had a peak use of 31787 segments, more than the prealloc setting of 512
[24329] 1/6/2014 -- 01:45:53 - (stream-tcp-reassemble.c:502) <Info> (StreamTcpReassembleFree) -- TCP segment pool of size 768 had a peak use of 30769 segments, more than the prealloc setting of 1024
[24329] 1/6/2014 -- 01:45:53 - (stream-tcp-reassemble.c:502) <Info> (StreamTcpReassembleFree) -- TCP segment pool of size 1448 had a peak use of 89446 segments, more than the prealloc setting of 1024
[24329] 1/6/2014 -- 01:45:53 - (stream-tcp-reassemble.c:502) <Info> (StreamTcpReassembleFree) -- TCP segment pool of size 65535 had a peak use of 81214 segments, more than the prealloc setting of 128
[24329] 1/6/2014 -- 01:45:53 - (stream.c:182) <Info> (StreamMsgQueuesDeinit) -- TCP segment chunk pool had a peak use of 20306 chunks, more than the prealloc setting of 250
[24329] 1/6/2014 -- 01:45:53 - (host.c:245) <Info> (HostPrintStats) -- host memory usage: 390144 bytes, maximum: 16777216
[24329] 1/6/2014 -- 01:45:55 - (detect.c:3890) <Info> (SigAddressCleanupStage1) -- cleaning up signature grouping structure... complete
[24329] 1/6/2014 -- 01:45:55 - (util-device.c:185) <Notice> (LiveDeviceListClean) -- Stats for 'eth3':  pkts: 124068935209, drop: 5245430626 (4.23%), invalid chksum: 0


Notice all the "TCP segment pool" messages. This is the actual tcp segment pool reassembly stats for that period of time that Suricata was running. We could adjust accordingly in the suricata.yaml (as compared to the default settings above)
   
stream:
  memcap: 14gb
  checksum-validation: no      # reject wrong csums
  midstream: false
  prealloc-sessions: 375000
  inline: no                  # auto will use inline mode in IPS mode, yes or no set it statically
  reassembly:
    memcap: 20gb
    depth: 12mb                  # reassemble 1mb into a stream
    toserver-chunk-size: 2560
    toclient-chunk-size: 2560
    randomize-chunk-size: yes
    #randomize-chunk-range: 10
    raw: yes
    chunk-prealloc: 20556
    segments:
      - size: 4
        prealloc: 61034
      - size: 16
        prealloc: 315465
      - size: 112
        prealloc: 114251
      - size: 248
        prealloc: 18405
      - size: 512
        prealloc: 30769
      - size: 768
        prealloc: 31793
      - size: 1448
        prealloc: 90470
      - size: 65535
        prealloc: 81342
   


   
The total RAM (reserved) consumption for these preallocations (from the stream.reassembly.memcap value ) would be:

4*61034 + 16*315465 + 112*114251 + 248*18405 + 512*30769 + 768*31793 + 1448*90470 + 65535*81342 
= 5524571410 bytes
= 5.14 GB of RAM

So we could preallocate the tcp segments and take the Suricata tuning even a step further and improve performance as well.

So now when you start Suricata with the "-v" switch in your suricata.log with this specific set up described above you should see something like:
...
...
[30709] 1/6/2014 -- 12:17:34 - (stream-tcp-reassemble.c:425) <Info> (StreamTcpReassemblyConfig) -- segment pool: pktsize 4, prealloc 61034
[30709] 1/6/2014 -- 12:17:34 - (stream-tcp-reassemble.c:425) <Info> (StreamTcpReassemblyConfig) -- segment pool: pktsize 16, prealloc 315465
[30709] 1/6/2014 -- 12:17:34 - (stream-tcp-reassemble.c:425) <Info> (StreamTcpReassemblyConfig) -- segment pool: pktsize 112, prealloc 114251
[30709] 1/6/2014 -- 12:17:34 - (stream-tcp-reassemble.c:425) <Info> (StreamTcpReassemblyConfig) -- segment pool: pktsize 248, prealloc 18405
[30709] 1/6/2014 -- 12:17:34 - (stream-tcp-reassemble.c:425) <Info> (StreamTcpReassemblyConfig) -- segment pool: pktsize 512, prealloc 30769
[30709] 1/6/2014 -- 12:17:34 - (stream-tcp-reassemble.c:425) <Info> (StreamTcpReassemblyConfig) -- segment pool: pktsize 768, prealloc 31793
[30709] 1/6/2014 -- 12:17:35 - (stream-tcp-reassemble.c:425) <Info> (StreamTcpReassemblyConfig) -- segment pool: pktsize 1448, prealloc 90470
[30709] 1/6/2014 -- 12:17:35 - (stream-tcp-reassemble.c:425) <Info> (StreamTcpReassemblyConfig) -- segment pool: pktsize 65535, prealloc 81342
[30709] 1/6/2014 -- 12:17:35 - (stream-tcp-reassemble.c:461) <Info> (StreamTcpReassemblyConfig) -- stream.reassembly "chunk-prealloc": 20556
...
...

NOTE:
Those 5.14 GB RAM in the example here will be preallocated (taken) from the stream.reassembly.memcap value. In other words it will not consume 5.14 GB of RAM more.

So be careful when setting up preallocation in order not to preallocate more of what you have.
In my case of 10Gbps suricata.yaml config I had:

stream:
  memcap: 14gb
  checksum-validation: no      # reject wrong csums
  midstream: false
  prealloc-sessions: 375000
  inline: no                  # auto will use inline mode in IPS mode, yes or no set it statically
  reassembly:
    memcap: 20gb
    depth: 12mb                  # reassemble 1mb into a stream


What this helps with is that it lowers CPU usage/contention for TCP segment allocation during reassembly - it is already preallocated and Suricata just uses it instead of creating it everytime it needs it. It also helps minimize the initial drops during startup.

Highly adaptable and  flexible.








No comments:

Post a Comment