Latencies issues which G1GC

I am facing a continues increase in GC pauses with G1GC algorithm. The service latencies continue to grow over time.
Once this happens, i restart my service and the latencies go back to normal. Post startup, the latencies again continue to increase over time.

At the time of startup, the service latencies are around ~200ms, but within 24 hours, they go up to 350ms, and continue to increase in a linear fashion.

The increase in service latencies match the increase in GarbageCollection metrics.

Service specifications –

I am running a java application (JDK-8) on M4-2X large boxes with 50 active threads per box. Service runs at an 12GB heap. The average latency of a request is about 250ms, and the rate of incoming requests is about ~20 per second per box.

G1G1 configurations –

        <jvmarg line="-Xms12288M"/>
        <jvmarg line="-Xmx12288M"/>

        <jvmarg line="-verbose:gc" />
        <jvmarg line="-XX:+UseG1GC"/>
        <jvmarg line="-XX:+PrintGCDetails" />
        <jvmarg line="-XX:+PrintGCTimeStamps" />
        <jvmarg line="-XX:+PrintTenuringDistribution" />
        <jvmarg line="-XX:+PrintGCApplicationStoppedTime" />
        <jvmarg line="-XX:MaxGCPauseMillis=250"/>
        <jvmarg line="-XX:ParallelGCThreads=20" />
        <jvmarg line="-XX:ConcGCThreads=5" />
        <jvmarg line="-XX:-UseGCLogFileRotation"/>

GC logs –

79488.355: Total time for which application threads were stopped: 0.0005309 seconds, Stopping threads took: 0.0000593 seconds
79494.559: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 369098752 bytes, new threshold 15 (max 15)
- age   1:   64725432 bytes,   64725432 total
- age   2:    8867888 bytes,   73593320 total
- age   3:    2503592 bytes,   76096912 total
- age   4:     134344 bytes,   76231256 total
- age   5:    3729424 bytes,   79960680 total
- age   6:     212000 bytes,   80172680 total
- age   7:     172568 bytes,   80345248 total
- age   8:     175312 bytes,   80520560 total
- age   9:     282480 bytes,   80803040 total
- age  10:     160952 bytes,   80963992 total
- age  11:     140856 bytes,   81104848 total
- age  12:     153384 bytes,   81258232 total
- age  13:     123648 bytes,   81381880 total
- age  14:      76360 bytes,   81458240 total
- age  15:      63888 bytes,   81522128 total
, 2.5241014 secs]
   [Parallel Time: 2482.2 ms, GC Workers: 20]
      [GC Worker Start (ms): Min: 79494558.9, Avg: 79494567.4, Max: 79494602.1, Diff: 43.2]
      [Ext Root Scanning (ms): Min: 0.0, Avg: 140.9, Max: 2478.3, Diff: 2478.3, Sum: 2818.8]
      [Update RS (ms): Min: 0.0, Avg: 5.3, Max: 41.9, Diff: 41.9, Sum: 106.9]
         [Processed Buffers: Min: 0, Avg: 23.2, Max: 80, Diff: 80, Sum: 465]
      [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 4.1]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4]
      [Object Copy (ms): Min: 0.0, Avg: 41.9, Max: 68.7, Diff: 68.7, Sum: 837.9]
      [Termination (ms): Min: 0.0, Avg: 2282.3, Max: 2415.8, Diff: 2415.8, Sum: 45645.3]
         [Termination Attempts: Min: 1, Avg: 21.5, Max: 68, Diff: 67, Sum: 430]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.1, Sum: 1.0]
      [GC Worker Total (ms): Min: 2435.8, Avg: 2470.7, Max: 2482.0, Diff: 46.2, Sum: 49414.5]
      [GC Worker End (ms): Min: 79497037.9, Avg: 79497038.1, Max: 79497041.0, Diff: 3.1]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.9 ms]
   [Other: 40.9 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 37.7 ms]
      [Ref Enq: 0.8 ms]
      [Redirty Cards: 0.4 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 1.3 ms]
   [Eden: 5512.0M(5512.0M)->0.0B(4444.0M) Survivors: 112.0M->128.0M Heap: 8222.2M(12.0G)->2707.5M(12.0G)]
 [Times: user=19.63 sys=0.18, real=2.53 secs]
79497.083: Total time for which application threads were stopped: 2.5252654 seconds, Stopping threads took: 0.0000914 seconds

I am looking for some help with GC configurations. On the basis of my reading, i am planning to increase the number of parallel threads to 32, set G1HeapRegionSize to 16M, and set ConcGCThreads = 8.