Hotspot-gc-dev Archive

List Statistics

  • Total Threads: 1110
  • Total Posts: 987

Phrases Used to Find This Thread

  #1  
27-05-2011 06:57 PM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)

  #2  
01-06-2011 06:24 PM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)

  #3  
01-06-2011 06:35 PM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)

  #4  
02-06-2011 03:15 AM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)

  #5  
02-06-2011 02:20 PM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)

  #6  
02-06-2011 04:35 PM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)
I did check the database but didn't find anything relevant. My search
terms may not be optimal, though I did scan through all the results
returned by "java.lang.OutOfMemoryError: Java heap space" as well as
"0K->0K".

I also suspected a bug in the collector and so I tried the same test
with the G1 collector, with the same OOM result. I didn't save the log
from the G1 test, but I can quite easily redo the test with any set of
JVM parameters that may be helpful in debugging -- the OOM seems to be
easily and consistently reproducible with this application.

Cheers,
Raman

On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.65 (attempted to grow)
> Tenured generation: 0.54 (attempted to grow)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.01 (attempted to grow)
> Tenured generation: 0.54 (no change)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>
> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
> for some reason that "failed" young gen collection triggers an immediate Full GC.
>
> Bug in the collector? Did you check the bug database?
>
> Regards,
> Kirk
>
)

  #7  
02-06-2011 05:01 PM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)
I did check the database but didn't find anything relevant. My search
terms may not be optimal, though I did scan through all the results
returned by "java.lang.OutOfMemoryError: Java heap space" as well as
"0K->0K".

I also suspected a bug in the collector and so I tried the same test
with the G1 collector, with the same OOM result. I didn't save the log
from the G1 test, but I can quite easily redo the test with any set of
JVM parameters that may be helpful in debugging -- the OOM seems to be
easily and consistently reproducible with this application.

Cheers,
Raman

On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.65 (attempted to grow)
> Tenured generation: 0.54 (attempted to grow)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.01 (attempted to grow)
> Tenured generation: 0.54 (no change)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>
> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
> for some reason that "failed" young gen collection triggers an immediate Full GC.
>
> Bug in the collector? Did you check the bug database?
>
> Regards,
> Kirk
>
)
are you trying to create a humungous object or array? Accidentally?

On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:

> I did check the database but didn't find anything relevant. My search
> terms may not be optimal, though I did scan through all the results
> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> "0K->0K".
>
> I also suspected a bug in the collector and so I tried the same test
> with the G1 collector, with the same OOM result. I didn't save the log
> from the G1 test, but I can quite easily redo the test with any set of
> JVM parameters that may be helpful in debugging -- the OOM seems to be
> easily and consistently reproducible with this application.
>
> Cheers,
> Raman
>
> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.65 (attempted to grow)
>> Tenured generation: 0.54 (attempted to grow)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.01 (attempted to grow)
>> Tenured generation: 0.54 (no change)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>
>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>
>> Bug in the collector? Did you check the bug database?
>>
>> Regards,
>> Kirk
>>

)

  #8  
02-06-2011 05:56 PM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)
I did check the database but didn't find anything relevant. My search
terms may not be optimal, though I did scan through all the results
returned by "java.lang.OutOfMemoryError: Java heap space" as well as
"0K->0K".

I also suspected a bug in the collector and so I tried the same test
with the G1 collector, with the same OOM result. I didn't save the log
from the G1 test, but I can quite easily redo the test with any set of
JVM parameters that may be helpful in debugging -- the OOM seems to be
easily and consistently reproducible with this application.

Cheers,
Raman

On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.65 (attempted to grow)
> Tenured generation: 0.54 (attempted to grow)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.01 (attempted to grow)
> Tenured generation: 0.54 (no change)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>
> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
> for some reason that "failed" young gen collection triggers an immediate Full GC.
>
> Bug in the collector? Did you check the bug database?
>
> Regards,
> Kirk
>
)
are you trying to create a humungous object or array? Accidentally?

On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:

> I did check the database but didn't find anything relevant. My search
> terms may not be optimal, though I did scan through all the results
> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> "0K->0K".
>
> I also suspected a bug in the collector and so I tried the same test
> with the G1 collector, with the same OOM result. I didn't save the log
> from the G1 test, but I can quite easily redo the test with any set of
> JVM parameters that may be helpful in debugging -- the OOM seems to be
> easily and consistently reproducible with this application.
>
> Cheers,
> Raman
>
> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.65 (attempted to grow)
>> Tenured generation: 0.54 (attempted to grow)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.01 (attempted to grow)
>> Tenured generation: 0.54 (no change)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>
>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>
>> Bug in the collector? Did you check the bug database?
>>
>> Regards,
>> Kirk
>>

)
I do tend to think that somewhere a large object or array is being
created. In particular, Infinispan is one library we are using that
may be allocating large chunks of memory -- indeed, replacing
Infinispan with a local cache does seem to "fix" the problem.

However, more information from the JVM would really be useful in
isolating the offending code in Infinispan. Ideally,

a) any large allocations should show up as part of the heap dump if
the allocation succeeded but then some other subsequent code caused
the OOM, or

b) if the allocation itself failed, the OOM exception should include a
stack trace that would allow me to isolate the allocation point (as
it does normally, but for some reason in this case doesn't).

In this case the heap dump shows plenty of room in heap, and there is
no stack trace at the OOM, so I don't really have any way to isolate
the offending allocation point. In which situations does the OOM
exception get printed without an associated stack trace?

Cheers,
Raman


On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> are you trying to create a humungous object or array? Accidentally?
>
> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>
>> I did check the database but didn't find anything relevant. My search
>> terms may not be optimal, though I did scan through all the results
>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>> "0K->0K".
>>
>> I also suspected a bug in the collector and so I tried the same test
>> with the G1 collector, with the same OOM result. I didn't save the log
>> from the G1 test, but I can quite easily redo the test with any set of
>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>> easily and consistently reproducible with this application.
>>
>> Cheers,
>> Raman
>>
>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.65 (attempted to grow)
>>> Tenured generation: 0.54 (attempted to grow)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.01 (attempted to grow)
>>> Tenured generation: 0.54 (no change)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>
>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>
>>> Bug in the collector? Did you check the bug database?
>>>
>>> Regards,
>>> Kirk
>>>
>
)

  #9  
02-06-2011 06:46 PM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)
I did check the database but didn't find anything relevant. My search
terms may not be optimal, though I did scan through all the results
returned by "java.lang.OutOfMemoryError: Java heap space" as well as
"0K->0K".

I also suspected a bug in the collector and so I tried the same test
with the G1 collector, with the same OOM result. I didn't save the log
from the G1 test, but I can quite easily redo the test with any set of
JVM parameters that may be helpful in debugging -- the OOM seems to be
easily and consistently reproducible with this application.

Cheers,
Raman

On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.65 (attempted to grow)
> Tenured generation: 0.54 (attempted to grow)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.01 (attempted to grow)
> Tenured generation: 0.54 (no change)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>
> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
> for some reason that "failed" young gen collection triggers an immediate Full GC.
>
> Bug in the collector? Did you check the bug database?
>
> Regards,
> Kirk
>
)
are you trying to create a humungous object or array? Accidentally?

On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:

> I did check the database but didn't find anything relevant. My search
> terms may not be optimal, though I did scan through all the results
> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> "0K->0K".
>
> I also suspected a bug in the collector and so I tried the same test
> with the G1 collector, with the same OOM result. I didn't save the log
> from the G1 test, but I can quite easily redo the test with any set of
> JVM parameters that may be helpful in debugging -- the OOM seems to be
> easily and consistently reproducible with this application.
>
> Cheers,
> Raman
>
> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.65 (attempted to grow)
>> Tenured generation: 0.54 (attempted to grow)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.01 (attempted to grow)
>> Tenured generation: 0.54 (no change)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>
>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>
>> Bug in the collector? Did you check the bug database?
>>
>> Regards,
>> Kirk
>>

)
I do tend to think that somewhere a large object or array is being
created. In particular, Infinispan is one library we are using that
may be allocating large chunks of memory -- indeed, replacing
Infinispan with a local cache does seem to "fix" the problem.

However, more information from the JVM would really be useful in
isolating the offending code in Infinispan. Ideally,

a) any large allocations should show up as part of the heap dump if
the allocation succeeded but then some other subsequent code caused
the OOM, or

b) if the allocation itself failed, the OOM exception should include a
stack trace that would allow me to isolate the allocation point (as
it does normally, but for some reason in this case doesn't).

In this case the heap dump shows plenty of room in heap, and there is
no stack trace at the OOM, so I don't really have any way to isolate
the offending allocation point. In which situations does the OOM
exception get printed without an associated stack trace?

Cheers,
Raman


On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> are you trying to create a humungous object or array? Accidentally?
>
> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>
>> I did check the database but didn't find anything relevant. My search
>> terms may not be optimal, though I did scan through all the results
>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>> "0K->0K".
>>
>> I also suspected a bug in the collector and so I tried the same test
>> with the G1 collector, with the same OOM result. I didn't save the log
>> from the G1 test, but I can quite easily redo the test with any set of
>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>> easily and consistently reproducible with this application.
>>
>> Cheers,
>> Raman
>>
>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.65 (attempted to grow)
>>> Tenured generation: 0.54 (attempted to grow)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.01 (attempted to grow)
>>> Tenured generation: 0.54 (no change)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>
>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>
>>> Bug in the collector? Did you check the bug database?
>>>
>>> Regards,
>>> Kirk
>>>
>
)
If your code is not catching the OOM exception you'd
expect to see the stack retrace when the program dies.
If it catches the exception and carries on, you'd want
it to print the exception detail. I don't know of
cases where the exception would just disappear.

In your case the report to stdout/stderr(?)that an OOM occurred and that
the heap is being dumped comes from inside the JVM
because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
After this point, your allocating thread would have gotten
an OOME which it probably caught and swallowed, and hence
the silence wrt the stack retrace you would normally see. You
will want to look at your Infinispan code to see how
it deals with the inability to allocate said large objects.

Recall that object size is limited by the size of and
available space in the largest area (Eden or Old) in your
Java heap. As Kirk noted, the full gc was to attempt allocation
of an object that didn't fit into the available space in
Eden or in Old (so from that you can estimate the size of
the request).

Note also that the JDK libraries will resize hashtables under
you and that can also cause large allocation requests
(but i don't know how they handle OOM's resulting from such
allocations).

-- ramki

On 06/02/11 09:56, Raman Gupta wrote:
> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
)

  #10  
02-06-2011 06:54 PM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)
I did check the database but didn't find anything relevant. My search
terms may not be optimal, though I did scan through all the results
returned by "java.lang.OutOfMemoryError: Java heap space" as well as
"0K->0K".

I also suspected a bug in the collector and so I tried the same test
with the G1 collector, with the same OOM result. I didn't save the log
from the G1 test, but I can quite easily redo the test with any set of
JVM parameters that may be helpful in debugging -- the OOM seems to be
easily and consistently reproducible with this application.

Cheers,
Raman

On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.65 (attempted to grow)
> Tenured generation: 0.54 (attempted to grow)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.01 (attempted to grow)
> Tenured generation: 0.54 (no change)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>
> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
> for some reason that "failed" young gen collection triggers an immediate Full GC.
>
> Bug in the collector? Did you check the bug database?
>
> Regards,
> Kirk
>
)
are you trying to create a humungous object or array? Accidentally?

On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:

> I did check the database but didn't find anything relevant. My search
> terms may not be optimal, though I did scan through all the results
> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> "0K->0K".
>
> I also suspected a bug in the collector and so I tried the same test
> with the G1 collector, with the same OOM result. I didn't save the log
> from the G1 test, but I can quite easily redo the test with any set of
> JVM parameters that may be helpful in debugging -- the OOM seems to be
> easily and consistently reproducible with this application.
>
> Cheers,
> Raman
>
> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.65 (attempted to grow)
>> Tenured generation: 0.54 (attempted to grow)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.01 (attempted to grow)
>> Tenured generation: 0.54 (no change)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>
>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>
>> Bug in the collector? Did you check the bug database?
>>
>> Regards,
>> Kirk
>>

)
I do tend to think that somewhere a large object or array is being
created. In particular, Infinispan is one library we are using that
may be allocating large chunks of memory -- indeed, replacing
Infinispan with a local cache does seem to "fix" the problem.

However, more information from the JVM would really be useful in
isolating the offending code in Infinispan. Ideally,

a) any large allocations should show up as part of the heap dump if
the allocation succeeded but then some other subsequent code caused
the OOM, or

b) if the allocation itself failed, the OOM exception should include a
stack trace that would allow me to isolate the allocation point (as
it does normally, but for some reason in this case doesn't).

In this case the heap dump shows plenty of room in heap, and there is
no stack trace at the OOM, so I don't really have any way to isolate
the offending allocation point. In which situations does the OOM
exception get printed without an associated stack trace?

Cheers,
Raman


On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> are you trying to create a humungous object or array? Accidentally?
>
> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>
>> I did check the database but didn't find anything relevant. My search
>> terms may not be optimal, though I did scan through all the results
>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>> "0K->0K".
>>
>> I also suspected a bug in the collector and so I tried the same test
>> with the G1 collector, with the same OOM result. I didn't save the log
>> from the G1 test, but I can quite easily redo the test with any set of
>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>> easily and consistently reproducible with this application.
>>
>> Cheers,
>> Raman
>>
>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.65 (attempted to grow)
>>> Tenured generation: 0.54 (attempted to grow)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.01 (attempted to grow)
>>> Tenured generation: 0.54 (no change)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>
>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>
>>> Bug in the collector? Did you check the bug database?
>>>
>>> Regards,
>>> Kirk
>>>
>
)
If your code is not catching the OOM exception you'd
expect to see the stack retrace when the program dies.
If it catches the exception and carries on, you'd want
it to print the exception detail. I don't know of
cases where the exception would just disappear.

In your case the report to stdout/stderr(?)that an OOM occurred and that
the heap is being dumped comes from inside the JVM
because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
After this point, your allocating thread would have gotten
an OOME which it probably caught and swallowed, and hence
the silence wrt the stack retrace you would normally see. You
will want to look at your Infinispan code to see how
it deals with the inability to allocate said large objects.

Recall that object size is limited by the size of and
available space in the largest area (Eden or Old) in your
Java heap. As Kirk noted, the full gc was to attempt allocation
of an object that didn't fit into the available space in
Eden or in Old (so from that you can estimate the size of
the request).

Note also that the JDK libraries will resize hashtables under
you and that can also cause large allocation requests
(but i don't know how they handle OOM's resulting from such
allocations).

-- ramki

On 06/02/11 09:56, Raman Gupta wrote:
> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
)
Well, GC is some what orthogonal to what you're application is up to except for this special case. I've cc'ed Manik in on this one maybe he's had someone run into it before.

Regards,
Kirk

On Jun 2, 2011, at 6:56 PM, Raman Gupta wrote:

> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
>>

)

  #11  
03-06-2011 02:15 AM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)
I did check the database but didn't find anything relevant. My search
terms may not be optimal, though I did scan through all the results
returned by "java.lang.OutOfMemoryError: Java heap space" as well as
"0K->0K".

I also suspected a bug in the collector and so I tried the same test
with the G1 collector, with the same OOM result. I didn't save the log
from the G1 test, but I can quite easily redo the test with any set of
JVM parameters that may be helpful in debugging -- the OOM seems to be
easily and consistently reproducible with this application.

Cheers,
Raman

On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.65 (attempted to grow)
> Tenured generation: 0.54 (attempted to grow)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.01 (attempted to grow)
> Tenured generation: 0.54 (no change)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>
> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
> for some reason that "failed" young gen collection triggers an immediate Full GC.
>
> Bug in the collector? Did you check the bug database?
>
> Regards,
> Kirk
>
)
are you trying to create a humungous object or array? Accidentally?

On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:

> I did check the database but didn't find anything relevant. My search
> terms may not be optimal, though I did scan through all the results
> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> "0K->0K".
>
> I also suspected a bug in the collector and so I tried the same test
> with the G1 collector, with the same OOM result. I didn't save the log
> from the G1 test, but I can quite easily redo the test with any set of
> JVM parameters that may be helpful in debugging -- the OOM seems to be
> easily and consistently reproducible with this application.
>
> Cheers,
> Raman
>
> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.65 (attempted to grow)
>> Tenured generation: 0.54 (attempted to grow)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.01 (attempted to grow)
>> Tenured generation: 0.54 (no change)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>
>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>
>> Bug in the collector? Did you check the bug database?
>>
>> Regards,
>> Kirk
>>

)
I do tend to think that somewhere a large object or array is being
created. In particular, Infinispan is one library we are using that
may be allocating large chunks of memory -- indeed, replacing
Infinispan with a local cache does seem to "fix" the problem.

However, more information from the JVM would really be useful in
isolating the offending code in Infinispan. Ideally,

a) any large allocations should show up as part of the heap dump if
the allocation succeeded but then some other subsequent code caused
the OOM, or

b) if the allocation itself failed, the OOM exception should include a
stack trace that would allow me to isolate the allocation point (as
it does normally, but for some reason in this case doesn't).

In this case the heap dump shows plenty of room in heap, and there is
no stack trace at the OOM, so I don't really have any way to isolate
the offending allocation point. In which situations does the OOM
exception get printed without an associated stack trace?

Cheers,
Raman


On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> are you trying to create a humungous object or array? Accidentally?
>
> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>
>> I did check the database but didn't find anything relevant. My search
>> terms may not be optimal, though I did scan through all the results
>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>> "0K->0K".
>>
>> I also suspected a bug in the collector and so I tried the same test
>> with the G1 collector, with the same OOM result. I didn't save the log
>> from the G1 test, but I can quite easily redo the test with any set of
>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>> easily and consistently reproducible with this application.
>>
>> Cheers,
>> Raman
>>
>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.65 (attempted to grow)
>>> Tenured generation: 0.54 (attempted to grow)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.01 (attempted to grow)
>>> Tenured generation: 0.54 (no change)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>
>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>
>>> Bug in the collector? Did you check the bug database?
>>>
>>> Regards,
>>> Kirk
>>>
>
)
If your code is not catching the OOM exception you'd
expect to see the stack retrace when the program dies.
If it catches the exception and carries on, you'd want
it to print the exception detail. I don't know of
cases where the exception would just disappear.

In your case the report to stdout/stderr(?)that an OOM occurred and that
the heap is being dumped comes from inside the JVM
because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
After this point, your allocating thread would have gotten
an OOME which it probably caught and swallowed, and hence
the silence wrt the stack retrace you would normally see. You
will want to look at your Infinispan code to see how
it deals with the inability to allocate said large objects.

Recall that object size is limited by the size of and
available space in the largest area (Eden or Old) in your
Java heap. As Kirk noted, the full gc was to attempt allocation
of an object that didn't fit into the available space in
Eden or in Old (so from that you can estimate the size of
the request).

Note also that the JDK libraries will resize hashtables under
you and that can also cause large allocation requests
(but i don't know how they handle OOM's resulting from such
allocations).

-- ramki

On 06/02/11 09:56, Raman Gupta wrote:
> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
)
Well, GC is some what orthogonal to what you're application is up to except for this special case. I've cc'ed Manik in on this one maybe he's had someone run into it before.

Regards,
Kirk

On Jun 2, 2011, at 6:56 PM, Raman Gupta wrote:

> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
>>

)
It would be *really* handy if there were a switch like:

-XX:+StackTraceOnOutOfMemoryError

to force the stack trace to be shown. Obviously looking at every line
of code of every library my application uses, including core JDK
libraries, for code paths where large amounts of heap may be allocated
and the associated OOME is caught and swallowed, is pretty much
impossible.

I think my next step is to increase the max heap size to a large value
which hopefully allows the large allocation to occur without failure,
and then periodically take heap dumps to isolate it.

Thanks,
Raman

On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> If your code is not catching the OOM exception you'd
> expect to see the stack retrace when the program dies.
> If it catches the exception and carries on, you'd want
> it to print the exception detail. I don't know of
> cases where the exception would just disappear.
>
> In your case the report to stdout/stderr(?)that an OOM occurred and that
> the heap is being dumped comes from inside the JVM
> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> After this point, your allocating thread would have gotten
> an OOME which it probably caught and swallowed, and hence
> the silence wrt the stack retrace you would normally see. You
> will want to look at your Infinispan code to see how
> it deals with the inability to allocate said large objects.
>
> Recall that object size is limited by the size of and
> available space in the largest area (Eden or Old) in your
> Java heap. As Kirk noted, the full gc was to attempt allocation
> of an object that didn't fit into the available space in
> Eden or in Old (so from that you can estimate the size of
> the request).
>
> Note also that the JDK libraries will resize hashtables under
> you and that can also cause large allocation requests
> (but i don't know how they handle OOM's resulting from such
> allocations).
>
> -- ramki
>
> On 06/02/11 09:56, Raman Gupta wrote:
>> I do tend to think that somewhere a large object or array is being
>> created. In particular, Infinispan is one library we are using that
>> may be allocating large chunks of memory -- indeed, replacing
>> Infinispan with a local cache does seem to "fix" the problem.
>>
>> However, more information from the JVM would really be useful in
>> isolating the offending code in Infinispan. Ideally,
>>
>> a) any large allocations should show up as part of the heap dump if
>> the allocation succeeded but then some other subsequent code caused
>> the OOM, or
>>
>> b) if the allocation itself failed, the OOM exception should include a
>> stack trace that would allow me to isolate the allocation point (as
>> it does normally, but for some reason in this case doesn't).
>>
>> In this case the heap dump shows plenty of room in heap, and there is
>> no stack trace at the OOM, so I don't really have any way to isolate
>> the offending allocation point. In which situations does the OOM
>> exception get printed without an associated stack trace?
>>
>> Cheers,
>> Raman
>>
>>
>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>> are you trying to create a humungous object or array? Accidentally?
>>>
>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>
>>>> I did check the database but didn't find anything relevant. My search
>>>> terms may not be optimal, though I did scan through all the results
>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>> "0K->0K".
>>>>
>>>> I also suspected a bug in the collector and so I tried the same test
>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>> log
>>>> from the G1 test, but I can quite easily redo the test with any
>>>> set of
>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>> to be
>>>> easily and consistently reproducible with this application.
>>>>
>>>> Cheers,
>>>> Raman
>>>>
>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.65 (attempted to grow)
>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>> actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.01 (attempted to grow)
>>>>> Tenured generation: 0.54 (no change)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>> 5691K from being full. Nothing happening in Perm.
>>>>> The second is where things start to get weird. I don't see why
>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>> gc and yet no application thread allocated any memory out of
>>>>> young gen.
>>>>> for some reason that "failed" young gen collection triggers an
>>>>> immediate Full GC.
>>>>>
>>>>> Bug in the collector? Did you check the bug database?
>>>>>
>>>>> Regards,
>>>>> Kirk
>>>>>
)

  #12  
03-06-2011 06:40 AM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)
I did check the database but didn't find anything relevant. My search
terms may not be optimal, though I did scan through all the results
returned by "java.lang.OutOfMemoryError: Java heap space" as well as
"0K->0K".

I also suspected a bug in the collector and so I tried the same test
with the G1 collector, with the same OOM result. I didn't save the log
from the G1 test, but I can quite easily redo the test with any set of
JVM parameters that may be helpful in debugging -- the OOM seems to be
easily and consistently reproducible with this application.

Cheers,
Raman

On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.65 (attempted to grow)
> Tenured generation: 0.54 (attempted to grow)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.01 (attempted to grow)
> Tenured generation: 0.54 (no change)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>
> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
> for some reason that "failed" young gen collection triggers an immediate Full GC.
>
> Bug in the collector? Did you check the bug database?
>
> Regards,
> Kirk
>
)
are you trying to create a humungous object or array? Accidentally?

On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:

> I did check the database but didn't find anything relevant. My search
> terms may not be optimal, though I did scan through all the results
> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> "0K->0K".
>
> I also suspected a bug in the collector and so I tried the same test
> with the G1 collector, with the same OOM result. I didn't save the log
> from the G1 test, but I can quite easily redo the test with any set of
> JVM parameters that may be helpful in debugging -- the OOM seems to be
> easily and consistently reproducible with this application.
>
> Cheers,
> Raman
>
> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.65 (attempted to grow)
>> Tenured generation: 0.54 (attempted to grow)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.01 (attempted to grow)
>> Tenured generation: 0.54 (no change)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>
>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>
>> Bug in the collector? Did you check the bug database?
>>
>> Regards,
>> Kirk
>>

)
I do tend to think that somewhere a large object or array is being
created. In particular, Infinispan is one library we are using that
may be allocating large chunks of memory -- indeed, replacing
Infinispan with a local cache does seem to "fix" the problem.

However, more information from the JVM would really be useful in
isolating the offending code in Infinispan. Ideally,

a) any large allocations should show up as part of the heap dump if
the allocation succeeded but then some other subsequent code caused
the OOM, or

b) if the allocation itself failed, the OOM exception should include a
stack trace that would allow me to isolate the allocation point (as
it does normally, but for some reason in this case doesn't).

In this case the heap dump shows plenty of room in heap, and there is
no stack trace at the OOM, so I don't really have any way to isolate
the offending allocation point. In which situations does the OOM
exception get printed without an associated stack trace?

Cheers,
Raman


On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> are you trying to create a humungous object or array? Accidentally?
>
> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>
>> I did check the database but didn't find anything relevant. My search
>> terms may not be optimal, though I did scan through all the results
>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>> "0K->0K".
>>
>> I also suspected a bug in the collector and so I tried the same test
>> with the G1 collector, with the same OOM result. I didn't save the log
>> from the G1 test, but I can quite easily redo the test with any set of
>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>> easily and consistently reproducible with this application.
>>
>> Cheers,
>> Raman
>>
>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.65 (attempted to grow)
>>> Tenured generation: 0.54 (attempted to grow)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.01 (attempted to grow)
>>> Tenured generation: 0.54 (no change)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>
>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>
>>> Bug in the collector? Did you check the bug database?
>>>
>>> Regards,
>>> Kirk
>>>
>
)
If your code is not catching the OOM exception you'd
expect to see the stack retrace when the program dies.
If it catches the exception and carries on, you'd want
it to print the exception detail. I don't know of
cases where the exception would just disappear.

In your case the report to stdout/stderr(?)that an OOM occurred and that
the heap is being dumped comes from inside the JVM
because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
After this point, your allocating thread would have gotten
an OOME which it probably caught and swallowed, and hence
the silence wrt the stack retrace you would normally see. You
will want to look at your Infinispan code to see how
it deals with the inability to allocate said large objects.

Recall that object size is limited by the size of and
available space in the largest area (Eden or Old) in your
Java heap. As Kirk noted, the full gc was to attempt allocation
of an object that didn't fit into the available space in
Eden or in Old (so from that you can estimate the size of
the request).

Note also that the JDK libraries will resize hashtables under
you and that can also cause large allocation requests
(but i don't know how they handle OOM's resulting from such
allocations).

-- ramki

On 06/02/11 09:56, Raman Gupta wrote:
> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
)
Well, GC is some what orthogonal to what you're application is up to except for this special case. I've cc'ed Manik in on this one maybe he's had someone run into it before.

Regards,
Kirk

On Jun 2, 2011, at 6:56 PM, Raman Gupta wrote:

> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
>>

)
It would be *really* handy if there were a switch like:

-XX:+StackTraceOnOutOfMemoryError

to force the stack trace to be shown. Obviously looking at every line
of code of every library my application uses, including core JDK
libraries, for code paths where large amounts of heap may be allocated
and the associated OOME is caught and swallowed, is pretty much
impossible.

I think my next step is to increase the max heap size to a large value
which hopefully allows the large allocation to occur without failure,
and then periodically take heap dumps to isolate it.

Thanks,
Raman

On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> If your code is not catching the OOM exception you'd
> expect to see the stack retrace when the program dies.
> If it catches the exception and carries on, you'd want
> it to print the exception detail. I don't know of
> cases where the exception would just disappear.
>
> In your case the report to stdout/stderr(?)that an OOM occurred and that
> the heap is being dumped comes from inside the JVM
> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> After this point, your allocating thread would have gotten
> an OOME which it probably caught and swallowed, and hence
> the silence wrt the stack retrace you would normally see. You
> will want to look at your Infinispan code to see how
> it deals with the inability to allocate said large objects.
>
> Recall that object size is limited by the size of and
> available space in the largest area (Eden or Old) in your
> Java heap. As Kirk noted, the full gc was to attempt allocation
> of an object that didn't fit into the available space in
> Eden or in Old (so from that you can estimate the size of
> the request).
>
> Note also that the JDK libraries will resize hashtables under
> you and that can also cause large allocation requests
> (but i don't know how they handle OOM's resulting from such
> allocations).
>
> -- ramki
>
> On 06/02/11 09:56, Raman Gupta wrote:
>> I do tend to think that somewhere a large object or array is being
>> created. In particular, Infinispan is one library we are using that
>> may be allocating large chunks of memory -- indeed, replacing
>> Infinispan with a local cache does seem to "fix" the problem.
>>
>> However, more information from the JVM would really be useful in
>> isolating the offending code in Infinispan. Ideally,
>>
>> a) any large allocations should show up as part of the heap dump if
>> the allocation succeeded but then some other subsequent code caused
>> the OOM, or
>>
>> b) if the allocation itself failed, the OOM exception should include a
>> stack trace that would allow me to isolate the allocation point (as
>> it does normally, but for some reason in this case doesn't).
>>
>> In this case the heap dump shows plenty of room in heap, and there is
>> no stack trace at the OOM, so I don't really have any way to isolate
>> the offending allocation point. In which situations does the OOM
>> exception get printed without an associated stack trace?
>>
>> Cheers,
>> Raman
>>
>>
>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>> are you trying to create a humungous object or array? Accidentally?
>>>
>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>
>>>> I did check the database but didn't find anything relevant. My search
>>>> terms may not be optimal, though I did scan through all the results
>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>> "0K->0K".
>>>>
>>>> I also suspected a bug in the collector and so I tried the same test
>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>> log
>>>> from the G1 test, but I can quite easily redo the test with any
>>>> set of
>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>> to be
>>>> easily and consistently reproducible with this application.
>>>>
>>>> Cheers,
>>>> Raman
>>>>
>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.65 (attempted to grow)
>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>> actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.01 (attempted to grow)
>>>>> Tenured generation: 0.54 (no change)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>> 5691K from being full. Nothing happening in Perm.
>>>>> The second is where things start to get weird. I don't see why
>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>> gc and yet no application thread allocated any memory out of
>>>>> young gen.
>>>>> for some reason that "failed" young gen collection triggers an
>>>>> immediate Full GC.
>>>>>
>>>>> Bug in the collector? Did you check the bug database?
>>>>>
>>>>> Regards,
>>>>> Kirk
>>>>>
)
You "just" have to find all the places where OOME is caught.
Hopefully there aren't too many of those?

-- ramki

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)

  #13  
03-06-2011 06:54 AM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)
I did check the database but didn't find anything relevant. My search
terms may not be optimal, though I did scan through all the results
returned by "java.lang.OutOfMemoryError: Java heap space" as well as
"0K->0K".

I also suspected a bug in the collector and so I tried the same test
with the G1 collector, with the same OOM result. I didn't save the log
from the G1 test, but I can quite easily redo the test with any set of
JVM parameters that may be helpful in debugging -- the OOM seems to be
easily and consistently reproducible with this application.

Cheers,
Raman

On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.65 (attempted to grow)
> Tenured generation: 0.54 (attempted to grow)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.01 (attempted to grow)
> Tenured generation: 0.54 (no change)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>
> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
> for some reason that "failed" young gen collection triggers an immediate Full GC.
>
> Bug in the collector? Did you check the bug database?
>
> Regards,
> Kirk
>
)
are you trying to create a humungous object or array? Accidentally?

On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:

> I did check the database but didn't find anything relevant. My search
> terms may not be optimal, though I did scan through all the results
> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> "0K->0K".
>
> I also suspected a bug in the collector and so I tried the same test
> with the G1 collector, with the same OOM result. I didn't save the log
> from the G1 test, but I can quite easily redo the test with any set of
> JVM parameters that may be helpful in debugging -- the OOM seems to be
> easily and consistently reproducible with this application.
>
> Cheers,
> Raman
>
> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.65 (attempted to grow)
>> Tenured generation: 0.54 (attempted to grow)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.01 (attempted to grow)
>> Tenured generation: 0.54 (no change)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>
>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>
>> Bug in the collector? Did you check the bug database?
>>
>> Regards,
>> Kirk
>>

)
I do tend to think that somewhere a large object or array is being
created. In particular, Infinispan is one library we are using that
may be allocating large chunks of memory -- indeed, replacing
Infinispan with a local cache does seem to "fix" the problem.

However, more information from the JVM would really be useful in
isolating the offending code in Infinispan. Ideally,

a) any large allocations should show up as part of the heap dump if
the allocation succeeded but then some other subsequent code caused
the OOM, or

b) if the allocation itself failed, the OOM exception should include a
stack trace that would allow me to isolate the allocation point (as
it does normally, but for some reason in this case doesn't).

In this case the heap dump shows plenty of room in heap, and there is
no stack trace at the OOM, so I don't really have any way to isolate
the offending allocation point. In which situations does the OOM
exception get printed without an associated stack trace?

Cheers,
Raman


On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> are you trying to create a humungous object or array? Accidentally?
>
> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>
>> I did check the database but didn't find anything relevant. My search
>> terms may not be optimal, though I did scan through all the results
>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>> "0K->0K".
>>
>> I also suspected a bug in the collector and so I tried the same test
>> with the G1 collector, with the same OOM result. I didn't save the log
>> from the G1 test, but I can quite easily redo the test with any set of
>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>> easily and consistently reproducible with this application.
>>
>> Cheers,
>> Raman
>>
>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.65 (attempted to grow)
>>> Tenured generation: 0.54 (attempted to grow)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.01 (attempted to grow)
>>> Tenured generation: 0.54 (no change)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>
>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>
>>> Bug in the collector? Did you check the bug database?
>>>
>>> Regards,
>>> Kirk
>>>
>
)
If your code is not catching the OOM exception you'd
expect to see the stack retrace when the program dies.
If it catches the exception and carries on, you'd want
it to print the exception detail. I don't know of
cases where the exception would just disappear.

In your case the report to stdout/stderr(?)that an OOM occurred and that
the heap is being dumped comes from inside the JVM
because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
After this point, your allocating thread would have gotten
an OOME which it probably caught and swallowed, and hence
the silence wrt the stack retrace you would normally see. You
will want to look at your Infinispan code to see how
it deals with the inability to allocate said large objects.

Recall that object size is limited by the size of and
available space in the largest area (Eden or Old) in your
Java heap. As Kirk noted, the full gc was to attempt allocation
of an object that didn't fit into the available space in
Eden or in Old (so from that you can estimate the size of
the request).

Note also that the JDK libraries will resize hashtables under
you and that can also cause large allocation requests
(but i don't know how they handle OOM's resulting from such
allocations).

-- ramki

On 06/02/11 09:56, Raman Gupta wrote:
> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
)
Well, GC is some what orthogonal to what you're application is up to except for this special case. I've cc'ed Manik in on this one maybe he's had someone run into it before.

Regards,
Kirk

On Jun 2, 2011, at 6:56 PM, Raman Gupta wrote:

> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
>>

)
It would be *really* handy if there were a switch like:

-XX:+StackTraceOnOutOfMemoryError

to force the stack trace to be shown. Obviously looking at every line
of code of every library my application uses, including core JDK
libraries, for code paths where large amounts of heap may be allocated
and the associated OOME is caught and swallowed, is pretty much
impossible.

I think my next step is to increase the max heap size to a large value
which hopefully allows the large allocation to occur without failure,
and then periodically take heap dumps to isolate it.

Thanks,
Raman

On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> If your code is not catching the OOM exception you'd
> expect to see the stack retrace when the program dies.
> If it catches the exception and carries on, you'd want
> it to print the exception detail. I don't know of
> cases where the exception would just disappear.
>
> In your case the report to stdout/stderr(?)that an OOM occurred and that
> the heap is being dumped comes from inside the JVM
> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> After this point, your allocating thread would have gotten
> an OOME which it probably caught and swallowed, and hence
> the silence wrt the stack retrace you would normally see. You
> will want to look at your Infinispan code to see how
> it deals with the inability to allocate said large objects.
>
> Recall that object size is limited by the size of and
> available space in the largest area (Eden or Old) in your
> Java heap. As Kirk noted, the full gc was to attempt allocation
> of an object that didn't fit into the available space in
> Eden or in Old (so from that you can estimate the size of
> the request).
>
> Note also that the JDK libraries will resize hashtables under
> you and that can also cause large allocation requests
> (but i don't know how they handle OOM's resulting from such
> allocations).
>
> -- ramki
>
> On 06/02/11 09:56, Raman Gupta wrote:
>> I do tend to think that somewhere a large object or array is being
>> created. In particular, Infinispan is one library we are using that
>> may be allocating large chunks of memory -- indeed, replacing
>> Infinispan with a local cache does seem to "fix" the problem.
>>
>> However, more information from the JVM would really be useful in
>> isolating the offending code in Infinispan. Ideally,
>>
>> a) any large allocations should show up as part of the heap dump if
>> the allocation succeeded but then some other subsequent code caused
>> the OOM, or
>>
>> b) if the allocation itself failed, the OOM exception should include a
>> stack trace that would allow me to isolate the allocation point (as
>> it does normally, but for some reason in this case doesn't).
>>
>> In this case the heap dump shows plenty of room in heap, and there is
>> no stack trace at the OOM, so I don't really have any way to isolate
>> the offending allocation point. In which situations does the OOM
>> exception get printed without an associated stack trace?
>>
>> Cheers,
>> Raman
>>
>>
>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>> are you trying to create a humungous object or array? Accidentally?
>>>
>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>
>>>> I did check the database but didn't find anything relevant. My search
>>>> terms may not be optimal, though I did scan through all the results
>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>> "0K->0K".
>>>>
>>>> I also suspected a bug in the collector and so I tried the same test
>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>> log
>>>> from the G1 test, but I can quite easily redo the test with any
>>>> set of
>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>> to be
>>>> easily and consistently reproducible with this application.
>>>>
>>>> Cheers,
>>>> Raman
>>>>
>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.65 (attempted to grow)
>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>> actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.01 (attempted to grow)
>>>>> Tenured generation: 0.54 (no change)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>> 5691K from being full. Nothing happening in Perm.
>>>>> The second is where things start to get weird. I don't see why
>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>> gc and yet no application thread allocated any memory out of
>>>>> young gen.
>>>>> for some reason that "failed" young gen collection triggers an
>>>>> immediate Full GC.
>>>>>
>>>>> Bug in the collector? Did you check the bug database?
>>>>>
>>>>> Regards,
>>>>> Kirk
>>>>>
)
You "just" have to find all the places where OOME is caught.
Hopefully there aren't too many of those?

-- ramki

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
Sorry, sent previous email without addressing all of the issues.

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError

Yes that would be handy, and probably not too difficult.
But I wonder also if something like OnOutOfMemoryError or
like would already get you enough info to get close to
the problem ... (although may be because it's executed in
a separate shell, by the time the command executes the
process has likely gone well past the point when the problem
occurred).

>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.

Try to look for places where OOME (or supertype?) is caught. I am
hoping there aren't too many of those...

>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.

Yes that seems reasonable, or may be use an allocation profiler
with the larger heap and find it that way...

-- ramki

>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)

  #14  
03-06-2011 07:35 AM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)
I did check the database but didn't find anything relevant. My search
terms may not be optimal, though I did scan through all the results
returned by "java.lang.OutOfMemoryError: Java heap space" as well as
"0K->0K".

I also suspected a bug in the collector and so I tried the same test
with the G1 collector, with the same OOM result. I didn't save the log
from the G1 test, but I can quite easily redo the test with any set of
JVM parameters that may be helpful in debugging -- the OOM seems to be
easily and consistently reproducible with this application.

Cheers,
Raman

On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.65 (attempted to grow)
> Tenured generation: 0.54 (attempted to grow)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.01 (attempted to grow)
> Tenured generation: 0.54 (no change)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>
> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
> for some reason that "failed" young gen collection triggers an immediate Full GC.
>
> Bug in the collector? Did you check the bug database?
>
> Regards,
> Kirk
>
)
are you trying to create a humungous object or array? Accidentally?

On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:

> I did check the database but didn't find anything relevant. My search
> terms may not be optimal, though I did scan through all the results
> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> "0K->0K".
>
> I also suspected a bug in the collector and so I tried the same test
> with the G1 collector, with the same OOM result. I didn't save the log
> from the G1 test, but I can quite easily redo the test with any set of
> JVM parameters that may be helpful in debugging -- the OOM seems to be
> easily and consistently reproducible with this application.
>
> Cheers,
> Raman
>
> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.65 (attempted to grow)
>> Tenured generation: 0.54 (attempted to grow)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.01 (attempted to grow)
>> Tenured generation: 0.54 (no change)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>
>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>
>> Bug in the collector? Did you check the bug database?
>>
>> Regards,
>> Kirk
>>

)
I do tend to think that somewhere a large object or array is being
created. In particular, Infinispan is one library we are using that
may be allocating large chunks of memory -- indeed, replacing
Infinispan with a local cache does seem to "fix" the problem.

However, more information from the JVM would really be useful in
isolating the offending code in Infinispan. Ideally,

a) any large allocations should show up as part of the heap dump if
the allocation succeeded but then some other subsequent code caused
the OOM, or

b) if the allocation itself failed, the OOM exception should include a
stack trace that would allow me to isolate the allocation point (as
it does normally, but for some reason in this case doesn't).

In this case the heap dump shows plenty of room in heap, and there is
no stack trace at the OOM, so I don't really have any way to isolate
the offending allocation point. In which situations does the OOM
exception get printed without an associated stack trace?

Cheers,
Raman


On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> are you trying to create a humungous object or array? Accidentally?
>
> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>
>> I did check the database but didn't find anything relevant. My search
>> terms may not be optimal, though I did scan through all the results
>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>> "0K->0K".
>>
>> I also suspected a bug in the collector and so I tried the same test
>> with the G1 collector, with the same OOM result. I didn't save the log
>> from the G1 test, but I can quite easily redo the test with any set of
>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>> easily and consistently reproducible with this application.
>>
>> Cheers,
>> Raman
>>
>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.65 (attempted to grow)
>>> Tenured generation: 0.54 (attempted to grow)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.01 (attempted to grow)
>>> Tenured generation: 0.54 (no change)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>
>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>
>>> Bug in the collector? Did you check the bug database?
>>>
>>> Regards,
>>> Kirk
>>>
>
)
If your code is not catching the OOM exception you'd
expect to see the stack retrace when the program dies.
If it catches the exception and carries on, you'd want
it to print the exception detail. I don't know of
cases where the exception would just disappear.

In your case the report to stdout/stderr(?)that an OOM occurred and that
the heap is being dumped comes from inside the JVM
because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
After this point, your allocating thread would have gotten
an OOME which it probably caught and swallowed, and hence
the silence wrt the stack retrace you would normally see. You
will want to look at your Infinispan code to see how
it deals with the inability to allocate said large objects.

Recall that object size is limited by the size of and
available space in the largest area (Eden or Old) in your
Java heap. As Kirk noted, the full gc was to attempt allocation
of an object that didn't fit into the available space in
Eden or in Old (so from that you can estimate the size of
the request).

Note also that the JDK libraries will resize hashtables under
you and that can also cause large allocation requests
(but i don't know how they handle OOM's resulting from such
allocations).

-- ramki

On 06/02/11 09:56, Raman Gupta wrote:
> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
)
Well, GC is some what orthogonal to what you're application is up to except for this special case. I've cc'ed Manik in on this one maybe he's had someone run into it before.

Regards,
Kirk

On Jun 2, 2011, at 6:56 PM, Raman Gupta wrote:

> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
>>

)
It would be *really* handy if there were a switch like:

-XX:+StackTraceOnOutOfMemoryError

to force the stack trace to be shown. Obviously looking at every line
of code of every library my application uses, including core JDK
libraries, for code paths where large amounts of heap may be allocated
and the associated OOME is caught and swallowed, is pretty much
impossible.

I think my next step is to increase the max heap size to a large value
which hopefully allows the large allocation to occur without failure,
and then periodically take heap dumps to isolate it.

Thanks,
Raman

On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> If your code is not catching the OOM exception you'd
> expect to see the stack retrace when the program dies.
> If it catches the exception and carries on, you'd want
> it to print the exception detail. I don't know of
> cases where the exception would just disappear.
>
> In your case the report to stdout/stderr(?)that an OOM occurred and that
> the heap is being dumped comes from inside the JVM
> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> After this point, your allocating thread would have gotten
> an OOME which it probably caught and swallowed, and hence
> the silence wrt the stack retrace you would normally see. You
> will want to look at your Infinispan code to see how
> it deals with the inability to allocate said large objects.
>
> Recall that object size is limited by the size of and
> available space in the largest area (Eden or Old) in your
> Java heap. As Kirk noted, the full gc was to attempt allocation
> of an object that didn't fit into the available space in
> Eden or in Old (so from that you can estimate the size of
> the request).
>
> Note also that the JDK libraries will resize hashtables under
> you and that can also cause large allocation requests
> (but i don't know how they handle OOM's resulting from such
> allocations).
>
> -- ramki
>
> On 06/02/11 09:56, Raman Gupta wrote:
>> I do tend to think that somewhere a large object or array is being
>> created. In particular, Infinispan is one library we are using that
>> may be allocating large chunks of memory -- indeed, replacing
>> Infinispan with a local cache does seem to "fix" the problem.
>>
>> However, more information from the JVM would really be useful in
>> isolating the offending code in Infinispan. Ideally,
>>
>> a) any large allocations should show up as part of the heap dump if
>> the allocation succeeded but then some other subsequent code caused
>> the OOM, or
>>
>> b) if the allocation itself failed, the OOM exception should include a
>> stack trace that would allow me to isolate the allocation point (as
>> it does normally, but for some reason in this case doesn't).
>>
>> In this case the heap dump shows plenty of room in heap, and there is
>> no stack trace at the OOM, so I don't really have any way to isolate
>> the offending allocation point. In which situations does the OOM
>> exception get printed without an associated stack trace?
>>
>> Cheers,
>> Raman
>>
>>
>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>> are you trying to create a humungous object or array? Accidentally?
>>>
>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>
>>>> I did check the database but didn't find anything relevant. My search
>>>> terms may not be optimal, though I did scan through all the results
>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>> "0K->0K".
>>>>
>>>> I also suspected a bug in the collector and so I tried the same test
>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>> log
>>>> from the G1 test, but I can quite easily redo the test with any
>>>> set of
>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>> to be
>>>> easily and consistently reproducible with this application.
>>>>
>>>> Cheers,
>>>> Raman
>>>>
>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.65 (attempted to grow)
>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>> actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.01 (attempted to grow)
>>>>> Tenured generation: 0.54 (no change)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>> 5691K from being full. Nothing happening in Perm.
>>>>> The second is where things start to get weird. I don't see why
>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>> gc and yet no application thread allocated any memory out of
>>>>> young gen.
>>>>> for some reason that "failed" young gen collection triggers an
>>>>> immediate Full GC.
>>>>>
>>>>> Bug in the collector? Did you check the bug database?
>>>>>
>>>>> Regards,
>>>>> Kirk
>>>>>
)
You "just" have to find all the places where OOME is caught.
Hopefully there aren't too many of those?

-- ramki

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
Sorry, sent previous email without addressing all of the issues.

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError

Yes that would be handy, and probably not too difficult.
But I wonder also if something like OnOutOfMemoryError or
like would already get you enough info to get close to
the problem ... (although may be because it's executed in
a separate shell, by the time the command executes the
process has likely gone well past the point when the problem
occurred).

>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.

Try to look for places where OOME (or supertype?) is caught. I am
hoping there aren't too many of those...

>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.

Yes that seems reasonable, or may be use an allocation profiler
with the larger heap and find it that way...

-- ramki

>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
you should get a stack trace, something must be eating it.

Regards,
Kirk

On Jun 3, 2011, at 3:15 AM, Raman Gupta wrote:

> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)

  #15  
03-06-2011 08:28 AM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)
I did check the database but didn't find anything relevant. My search
terms may not be optimal, though I did scan through all the results
returned by "java.lang.OutOfMemoryError: Java heap space" as well as
"0K->0K".

I also suspected a bug in the collector and so I tried the same test
with the G1 collector, with the same OOM result. I didn't save the log
from the G1 test, but I can quite easily redo the test with any set of
JVM parameters that may be helpful in debugging -- the OOM seems to be
easily and consistently reproducible with this application.

Cheers,
Raman

On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.65 (attempted to grow)
> Tenured generation: 0.54 (attempted to grow)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.01 (attempted to grow)
> Tenured generation: 0.54 (no change)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>
> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
> for some reason that "failed" young gen collection triggers an immediate Full GC.
>
> Bug in the collector? Did you check the bug database?
>
> Regards,
> Kirk
>
)
are you trying to create a humungous object or array? Accidentally?

On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:

> I did check the database but didn't find anything relevant. My search
> terms may not be optimal, though I did scan through all the results
> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> "0K->0K".
>
> I also suspected a bug in the collector and so I tried the same test
> with the G1 collector, with the same OOM result. I didn't save the log
> from the G1 test, but I can quite easily redo the test with any set of
> JVM parameters that may be helpful in debugging -- the OOM seems to be
> easily and consistently reproducible with this application.
>
> Cheers,
> Raman
>
> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.65 (attempted to grow)
>> Tenured generation: 0.54 (attempted to grow)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.01 (attempted to grow)
>> Tenured generation: 0.54 (no change)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>
>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>
>> Bug in the collector? Did you check the bug database?
>>
>> Regards,
>> Kirk
>>

)
I do tend to think that somewhere a large object or array is being
created. In particular, Infinispan is one library we are using that
may be allocating large chunks of memory -- indeed, replacing
Infinispan with a local cache does seem to "fix" the problem.

However, more information from the JVM would really be useful in
isolating the offending code in Infinispan. Ideally,

a) any large allocations should show up as part of the heap dump if
the allocation succeeded but then some other subsequent code caused
the OOM, or

b) if the allocation itself failed, the OOM exception should include a
stack trace that would allow me to isolate the allocation point (as
it does normally, but for some reason in this case doesn't).

In this case the heap dump shows plenty of room in heap, and there is
no stack trace at the OOM, so I don't really have any way to isolate
the offending allocation point. In which situations does the OOM
exception get printed without an associated stack trace?

Cheers,
Raman


On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> are you trying to create a humungous object or array? Accidentally?
>
> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>
>> I did check the database but didn't find anything relevant. My search
>> terms may not be optimal, though I did scan through all the results
>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>> "0K->0K".
>>
>> I also suspected a bug in the collector and so I tried the same test
>> with the G1 collector, with the same OOM result. I didn't save the log
>> from the G1 test, but I can quite easily redo the test with any set of
>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>> easily and consistently reproducible with this application.
>>
>> Cheers,
>> Raman
>>
>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.65 (attempted to grow)
>>> Tenured generation: 0.54 (attempted to grow)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.01 (attempted to grow)
>>> Tenured generation: 0.54 (no change)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>
>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>
>>> Bug in the collector? Did you check the bug database?
>>>
>>> Regards,
>>> Kirk
>>>
>
)
If your code is not catching the OOM exception you'd
expect to see the stack retrace when the program dies.
If it catches the exception and carries on, you'd want
it to print the exception detail. I don't know of
cases where the exception would just disappear.

In your case the report to stdout/stderr(?)that an OOM occurred and that
the heap is being dumped comes from inside the JVM
because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
After this point, your allocating thread would have gotten
an OOME which it probably caught and swallowed, and hence
the silence wrt the stack retrace you would normally see. You
will want to look at your Infinispan code to see how
it deals with the inability to allocate said large objects.

Recall that object size is limited by the size of and
available space in the largest area (Eden or Old) in your
Java heap. As Kirk noted, the full gc was to attempt allocation
of an object that didn't fit into the available space in
Eden or in Old (so from that you can estimate the size of
the request).

Note also that the JDK libraries will resize hashtables under
you and that can also cause large allocation requests
(but i don't know how they handle OOM's resulting from such
allocations).

-- ramki

On 06/02/11 09:56, Raman Gupta wrote:
> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
)
Well, GC is some what orthogonal to what you're application is up to except for this special case. I've cc'ed Manik in on this one maybe he's had someone run into it before.

Regards,
Kirk

On Jun 2, 2011, at 6:56 PM, Raman Gupta wrote:

> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
>>

)
It would be *really* handy if there were a switch like:

-XX:+StackTraceOnOutOfMemoryError

to force the stack trace to be shown. Obviously looking at every line
of code of every library my application uses, including core JDK
libraries, for code paths where large amounts of heap may be allocated
and the associated OOME is caught and swallowed, is pretty much
impossible.

I think my next step is to increase the max heap size to a large value
which hopefully allows the large allocation to occur without failure,
and then periodically take heap dumps to isolate it.

Thanks,
Raman

On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> If your code is not catching the OOM exception you'd
> expect to see the stack retrace when the program dies.
> If it catches the exception and carries on, you'd want
> it to print the exception detail. I don't know of
> cases where the exception would just disappear.
>
> In your case the report to stdout/stderr(?)that an OOM occurred and that
> the heap is being dumped comes from inside the JVM
> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> After this point, your allocating thread would have gotten
> an OOME which it probably caught and swallowed, and hence
> the silence wrt the stack retrace you would normally see. You
> will want to look at your Infinispan code to see how
> it deals with the inability to allocate said large objects.
>
> Recall that object size is limited by the size of and
> available space in the largest area (Eden or Old) in your
> Java heap. As Kirk noted, the full gc was to attempt allocation
> of an object that didn't fit into the available space in
> Eden or in Old (so from that you can estimate the size of
> the request).
>
> Note also that the JDK libraries will resize hashtables under
> you and that can also cause large allocation requests
> (but i don't know how they handle OOM's resulting from such
> allocations).
>
> -- ramki
>
> On 06/02/11 09:56, Raman Gupta wrote:
>> I do tend to think that somewhere a large object or array is being
>> created. In particular, Infinispan is one library we are using that
>> may be allocating large chunks of memory -- indeed, replacing
>> Infinispan with a local cache does seem to "fix" the problem.
>>
>> However, more information from the JVM would really be useful in
>> isolating the offending code in Infinispan. Ideally,
>>
>> a) any large allocations should show up as part of the heap dump if
>> the allocation succeeded but then some other subsequent code caused
>> the OOM, or
>>
>> b) if the allocation itself failed, the OOM exception should include a
>> stack trace that would allow me to isolate the allocation point (as
>> it does normally, but for some reason in this case doesn't).
>>
>> In this case the heap dump shows plenty of room in heap, and there is
>> no stack trace at the OOM, so I don't really have any way to isolate
>> the offending allocation point. In which situations does the OOM
>> exception get printed without an associated stack trace?
>>
>> Cheers,
>> Raman
>>
>>
>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>> are you trying to create a humungous object or array? Accidentally?
>>>
>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>
>>>> I did check the database but didn't find anything relevant. My search
>>>> terms may not be optimal, though I did scan through all the results
>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>> "0K->0K".
>>>>
>>>> I also suspected a bug in the collector and so I tried the same test
>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>> log
>>>> from the G1 test, but I can quite easily redo the test with any
>>>> set of
>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>> to be
>>>> easily and consistently reproducible with this application.
>>>>
>>>> Cheers,
>>>> Raman
>>>>
>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.65 (attempted to grow)
>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>> actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.01 (attempted to grow)
>>>>> Tenured generation: 0.54 (no change)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>> 5691K from being full. Nothing happening in Perm.
>>>>> The second is where things start to get weird. I don't see why
>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>> gc and yet no application thread allocated any memory out of
>>>>> young gen.
>>>>> for some reason that "failed" young gen collection triggers an
>>>>> immediate Full GC.
>>>>>
>>>>> Bug in the collector? Did you check the bug database?
>>>>>
>>>>> Regards,
>>>>> Kirk
>>>>>
)
You "just" have to find all the places where OOME is caught.
Hopefully there aren't too many of those?

-- ramki

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
Sorry, sent previous email without addressing all of the issues.

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError

Yes that would be handy, and probably not too difficult.
But I wonder also if something like OnOutOfMemoryError or
like would already get you enough info to get close to
the problem ... (although may be because it's executed in
a separate shell, by the time the command executes the
process has likely gone well past the point when the problem
occurred).

>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.

Try to look for places where OOME (or supertype?) is caught. I am
hoping there aren't too many of those...

>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.

Yes that seems reasonable, or may be use an allocation profiler
with the larger heap and find it that way...

-- ramki

>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
you should get a stack trace, something must be eating it.

Regards,
Kirk

On Jun 3, 2011, at 3:15 AM, Raman Gupta wrote:

> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
Y. Srinivas Ramakrishna () wrote:
> Sorry, sent previous email without addressing all of the issues.
>
> On 6/2/2011 6:15 PM, Raman Gupta wrote:
> > It would be *really* handy if there were a switch like:
> >
> > -XX:+StackTraceOnOutOfMemoryError
>
> Yes that would be handy, and probably not too difficult.
> But I wonder also if something like OnOutOfMemoryError or
> like would already get you enough info to get close to
> the problem ... (although may be because it's executed in
> a separate shell, by the time the command executes the
> process has likely gone well past the point when the problem
> occurred).

No need to worry, the OnOutOfMemoryError commands are run while the
JVM is at a safepoint. This worked for me:

java -XX:OnOutOfMemoryError='jstack %p' ...

-John

> > to force the stack trace to be shown. Obviously looking at every line
> > of code of every library my application uses, including core JDK
> > libraries, for code paths where large amounts of heap may be allocated
> > and the associated OOME is caught and swallowed, is pretty much
> > impossible.
>
> Try to look for places where OOME (or supertype?) is caught. I am
> hoping there aren't too many of those...
>
> >
> > I think my next step is to increase the max heap size to a large value
> > which hopefully allows the large allocation to occur without failure,
> > and then periodically take heap dumps to isolate it.
>
> Yes that seems reasonable, or may be use an allocation profiler
> with the larger heap and find it that way...
>
> -- ramki
>
> >
> > Thanks,
> > Raman
> >
> > On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> >> If your code is not catching the OOM exception you'd
> >> expect to see the stack retrace when the program dies.
> >> If it catches the exception and carries on, you'd want
> >> it to print the exception detail. I don't know of
> >> cases where the exception would just disappear.
> >>
> >> In your case the report to stdout/stderr(?)that an OOM occurred and that
> >> the heap is being dumped comes from inside the JVM
> >> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> >> After this point, your allocating thread would have gotten
> >> an OOME which it probably caught and swallowed, and hence
> >> the silence wrt the stack retrace you would normally see. You
> >> will want to look at your Infinispan code to see how
> >> it deals with the inability to allocate said large objects.
> >>
> >> Recall that object size is limited by the size of and
> >> available space in the largest area (Eden or Old) in your
> >> Java heap. As Kirk noted, the full gc was to attempt allocation
> >> of an object that didn't fit into the available space in
> >> Eden or in Old (so from that you can estimate the size of
> >> the request).
> >>
> >> Note also that the JDK libraries will resize hashtables under
> >> you and that can also cause large allocation requests
> >> (but i don't know how they handle OOM's resulting from such
> >> allocations).
> >>
> >> -- ramki
> >>
> >> On 06/02/11 09:56, Raman Gupta wrote:
> >>> I do tend to think that somewhere a large object or array is being
> >>> created. In particular, Infinispan is one library we are using that
> >>> may be allocating large chunks of memory -- indeed, replacing
> >>> Infinispan with a local cache does seem to "fix" the problem.
> >>>
> >>> However, more information from the JVM would really be useful in
> >>> isolating the offending code in Infinispan. Ideally,
> >>>
> >>> a) any large allocations should show up as part of the heap dump if
> >>> the allocation succeeded but then some other subsequent code caused
> >>> the OOM, or
> >>>
> >>> b) if the allocation itself failed, the OOM exception should include a
> >>> stack trace that would allow me to isolate the allocation point (as
> >>> it does normally, but for some reason in this case doesn't).
> >>>
> >>> In this case the heap dump shows plenty of room in heap, and there is
> >>> no stack trace at the OOM, so I don't really have any way to isolate
> >>> the offending allocation point. In which situations does the OOM
> >>> exception get printed without an associated stack trace?
> >>>
> >>> Cheers,
> >>> Raman
> >>>
> >>>
> >>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> >>>> are you trying to create a humungous object or array? Accidentally?
> >>>>
> >>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
> >>>>
> >>>>> I did check the database but didn't find anything relevant. My search
> >>>>> terms may not be optimal, though I did scan through all the results
> >>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> >>>>> "0K->0K".
> >>>>>
> >>>>> I also suspected a bug in the collector and so I tried the same test
> >>>>> with the G1 collector, with the same OOM result. I didn't save the
> >>>>> log
> >>>>> from the G1 test, but I can quite easily redo the test with any
> >>>>> set of
> >>>>> JVM parameters that may be helpful in debugging -- the OOM seems
> >>>>> to be
> >>>>> easily and consistently reproducible with this application.
> >>>>>
> >>>>> Cheers,
> >>>>> Raman
> >>>>>
> >>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> >>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
> >>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
> >>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
> >>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> >>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> >>>>>> GC overhead (%)
> >>>>>> Young generation: 2.65 (attempted to grow)
> >>>>>> Tenured generation: 0.54 (attempted to grow)
> >>>>>> Tenuring threshold: (attempted to decrease to balance GC
> >>>>>> costs) = 1
> >>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
> >>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
> >>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
> >>>>>> actions to meet *** throughput goal ***
> >>>>>> GC overhead (%)
> >>>>>> Young generation: 2.01 (attempted to grow)
> >>>>>> Tenured generation: 0.54 (no change)
> >>>>>> Tenuring threshold: (attempted to decrease to balance GC
> >>>>>> costs) = 1
> >>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
> >>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
> >>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
> >>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> >>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
> >>>>>> 5691K from being full. Nothing happening in Perm.
> >>>>>> The second is where things start to get weird. I don't see why
> >>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
> >>>>>> gc and yet no application thread allocated any memory out of
> >>>>>> young gen.
> >>>>>> for some reason that "failed" young gen collection triggers an
> >>>>>> immediate Full GC.
> >>>>>>
> >>>>>> Bug in the collector? Did you check the bug database?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Kirk
> >>>>>>
>
)

  #16  
03-06-2011 03:52 PM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)
I did check the database but didn't find anything relevant. My search
terms may not be optimal, though I did scan through all the results
returned by "java.lang.OutOfMemoryError: Java heap space" as well as
"0K->0K".

I also suspected a bug in the collector and so I tried the same test
with the G1 collector, with the same OOM result. I didn't save the log
from the G1 test, but I can quite easily redo the test with any set of
JVM parameters that may be helpful in debugging -- the OOM seems to be
easily and consistently reproducible with this application.

Cheers,
Raman

On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.65 (attempted to grow)
> Tenured generation: 0.54 (attempted to grow)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.01 (attempted to grow)
> Tenured generation: 0.54 (no change)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>
> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
> for some reason that "failed" young gen collection triggers an immediate Full GC.
>
> Bug in the collector? Did you check the bug database?
>
> Regards,
> Kirk
>
)
are you trying to create a humungous object or array? Accidentally?

On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:

> I did check the database but didn't find anything relevant. My search
> terms may not be optimal, though I did scan through all the results
> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> "0K->0K".
>
> I also suspected a bug in the collector and so I tried the same test
> with the G1 collector, with the same OOM result. I didn't save the log
> from the G1 test, but I can quite easily redo the test with any set of
> JVM parameters that may be helpful in debugging -- the OOM seems to be
> easily and consistently reproducible with this application.
>
> Cheers,
> Raman
>
> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.65 (attempted to grow)
>> Tenured generation: 0.54 (attempted to grow)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.01 (attempted to grow)
>> Tenured generation: 0.54 (no change)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>
>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>
>> Bug in the collector? Did you check the bug database?
>>
>> Regards,
>> Kirk
>>

)
I do tend to think that somewhere a large object or array is being
created. In particular, Infinispan is one library we are using that
may be allocating large chunks of memory -- indeed, replacing
Infinispan with a local cache does seem to "fix" the problem.

However, more information from the JVM would really be useful in
isolating the offending code in Infinispan. Ideally,

a) any large allocations should show up as part of the heap dump if
the allocation succeeded but then some other subsequent code caused
the OOM, or

b) if the allocation itself failed, the OOM exception should include a
stack trace that would allow me to isolate the allocation point (as
it does normally, but for some reason in this case doesn't).

In this case the heap dump shows plenty of room in heap, and there is
no stack trace at the OOM, so I don't really have any way to isolate
the offending allocation point. In which situations does the OOM
exception get printed without an associated stack trace?

Cheers,
Raman


On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> are you trying to create a humungous object or array? Accidentally?
>
> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>
>> I did check the database but didn't find anything relevant. My search
>> terms may not be optimal, though I did scan through all the results
>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>> "0K->0K".
>>
>> I also suspected a bug in the collector and so I tried the same test
>> with the G1 collector, with the same OOM result. I didn't save the log
>> from the G1 test, but I can quite easily redo the test with any set of
>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>> easily and consistently reproducible with this application.
>>
>> Cheers,
>> Raman
>>
>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.65 (attempted to grow)
>>> Tenured generation: 0.54 (attempted to grow)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.01 (attempted to grow)
>>> Tenured generation: 0.54 (no change)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>
>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>
>>> Bug in the collector? Did you check the bug database?
>>>
>>> Regards,
>>> Kirk
>>>
>
)
If your code is not catching the OOM exception you'd
expect to see the stack retrace when the program dies.
If it catches the exception and carries on, you'd want
it to print the exception detail. I don't know of
cases where the exception would just disappear.

In your case the report to stdout/stderr(?)that an OOM occurred and that
the heap is being dumped comes from inside the JVM
because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
After this point, your allocating thread would have gotten
an OOME which it probably caught and swallowed, and hence
the silence wrt the stack retrace you would normally see. You
will want to look at your Infinispan code to see how
it deals with the inability to allocate said large objects.

Recall that object size is limited by the size of and
available space in the largest area (Eden or Old) in your
Java heap. As Kirk noted, the full gc was to attempt allocation
of an object that didn't fit into the available space in
Eden or in Old (so from that you can estimate the size of
the request).

Note also that the JDK libraries will resize hashtables under
you and that can also cause large allocation requests
(but i don't know how they handle OOM's resulting from such
allocations).

-- ramki

On 06/02/11 09:56, Raman Gupta wrote:
> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
)
Well, GC is some what orthogonal to what you're application is up to except for this special case. I've cc'ed Manik in on this one maybe he's had someone run into it before.

Regards,
Kirk

On Jun 2, 2011, at 6:56 PM, Raman Gupta wrote:

> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
>>

)
It would be *really* handy if there were a switch like:

-XX:+StackTraceOnOutOfMemoryError

to force the stack trace to be shown. Obviously looking at every line
of code of every library my application uses, including core JDK
libraries, for code paths where large amounts of heap may be allocated
and the associated OOME is caught and swallowed, is pretty much
impossible.

I think my next step is to increase the max heap size to a large value
which hopefully allows the large allocation to occur without failure,
and then periodically take heap dumps to isolate it.

Thanks,
Raman

On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> If your code is not catching the OOM exception you'd
> expect to see the stack retrace when the program dies.
> If it catches the exception and carries on, you'd want
> it to print the exception detail. I don't know of
> cases where the exception would just disappear.
>
> In your case the report to stdout/stderr(?)that an OOM occurred and that
> the heap is being dumped comes from inside the JVM
> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> After this point, your allocating thread would have gotten
> an OOME which it probably caught and swallowed, and hence
> the silence wrt the stack retrace you would normally see. You
> will want to look at your Infinispan code to see how
> it deals with the inability to allocate said large objects.
>
> Recall that object size is limited by the size of and
> available space in the largest area (Eden or Old) in your
> Java heap. As Kirk noted, the full gc was to attempt allocation
> of an object that didn't fit into the available space in
> Eden or in Old (so from that you can estimate the size of
> the request).
>
> Note also that the JDK libraries will resize hashtables under
> you and that can also cause large allocation requests
> (but i don't know how they handle OOM's resulting from such
> allocations).
>
> -- ramki
>
> On 06/02/11 09:56, Raman Gupta wrote:
>> I do tend to think that somewhere a large object or array is being
>> created. In particular, Infinispan is one library we are using that
>> may be allocating large chunks of memory -- indeed, replacing
>> Infinispan with a local cache does seem to "fix" the problem.
>>
>> However, more information from the JVM would really be useful in
>> isolating the offending code in Infinispan. Ideally,
>>
>> a) any large allocations should show up as part of the heap dump if
>> the allocation succeeded but then some other subsequent code caused
>> the OOM, or
>>
>> b) if the allocation itself failed, the OOM exception should include a
>> stack trace that would allow me to isolate the allocation point (as
>> it does normally, but for some reason in this case doesn't).
>>
>> In this case the heap dump shows plenty of room in heap, and there is
>> no stack trace at the OOM, so I don't really have any way to isolate
>> the offending allocation point. In which situations does the OOM
>> exception get printed without an associated stack trace?
>>
>> Cheers,
>> Raman
>>
>>
>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>> are you trying to create a humungous object or array? Accidentally?
>>>
>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>
>>>> I did check the database but didn't find anything relevant. My search
>>>> terms may not be optimal, though I did scan through all the results
>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>> "0K->0K".
>>>>
>>>> I also suspected a bug in the collector and so I tried the same test
>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>> log
>>>> from the G1 test, but I can quite easily redo the test with any
>>>> set of
>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>> to be
>>>> easily and consistently reproducible with this application.
>>>>
>>>> Cheers,
>>>> Raman
>>>>
>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.65 (attempted to grow)
>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>> actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.01 (attempted to grow)
>>>>> Tenured generation: 0.54 (no change)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>> 5691K from being full. Nothing happening in Perm.
>>>>> The second is where things start to get weird. I don't see why
>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>> gc and yet no application thread allocated any memory out of
>>>>> young gen.
>>>>> for some reason that "failed" young gen collection triggers an
>>>>> immediate Full GC.
>>>>>
>>>>> Bug in the collector? Did you check the bug database?
>>>>>
>>>>> Regards,
>>>>> Kirk
>>>>>
)
You "just" have to find all the places where OOME is caught.
Hopefully there aren't too many of those?

-- ramki

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
Sorry, sent previous email without addressing all of the issues.

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError

Yes that would be handy, and probably not too difficult.
But I wonder also if something like OnOutOfMemoryError or
like would already get you enough info to get close to
the problem ... (although may be because it's executed in
a separate shell, by the time the command executes the
process has likely gone well past the point when the problem
occurred).

>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.

Try to look for places where OOME (or supertype?) is caught. I am
hoping there aren't too many of those...

>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.

Yes that seems reasonable, or may be use an allocation profiler
with the larger heap and find it that way...

-- ramki

>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
you should get a stack trace, something must be eating it.

Regards,
Kirk

On Jun 3, 2011, at 3:15 AM, Raman Gupta wrote:

> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
Y. Srinivas Ramakrishna () wrote:
> Sorry, sent previous email without addressing all of the issues.
>
> On 6/2/2011 6:15 PM, Raman Gupta wrote:
> > It would be *really* handy if there were a switch like:
> >
> > -XX:+StackTraceOnOutOfMemoryError
>
> Yes that would be handy, and probably not too difficult.
> But I wonder also if something like OnOutOfMemoryError or
> like would already get you enough info to get close to
> the problem ... (although may be because it's executed in
> a separate shell, by the time the command executes the
> process has likely gone well past the point when the problem
> occurred).

No need to worry, the OnOutOfMemoryError commands are run while the
JVM is at a safepoint. This worked for me:

java -XX:OnOutOfMemoryError='jstack %p' ...

-John

> > to force the stack trace to be shown. Obviously looking at every line
> > of code of every library my application uses, including core JDK
> > libraries, for code paths where large amounts of heap may be allocated
> > and the associated OOME is caught and swallowed, is pretty much
> > impossible.
>
> Try to look for places where OOME (or supertype?) is caught. I am
> hoping there aren't too many of those...
>
> >
> > I think my next step is to increase the max heap size to a large value
> > which hopefully allows the large allocation to occur without failure,
> > and then periodically take heap dumps to isolate it.
>
> Yes that seems reasonable, or may be use an allocation profiler
> with the larger heap and find it that way...
>
> -- ramki
>
> >
> > Thanks,
> > Raman
> >
> > On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> >> If your code is not catching the OOM exception you'd
> >> expect to see the stack retrace when the program dies.
> >> If it catches the exception and carries on, you'd want
> >> it to print the exception detail. I don't know of
> >> cases where the exception would just disappear.
> >>
> >> In your case the report to stdout/stderr(?)that an OOM occurred and that
> >> the heap is being dumped comes from inside the JVM
> >> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> >> After this point, your allocating thread would have gotten
> >> an OOME which it probably caught and swallowed, and hence
> >> the silence wrt the stack retrace you would normally see. You
> >> will want to look at your Infinispan code to see how
> >> it deals with the inability to allocate said large objects.
> >>
> >> Recall that object size is limited by the size of and
> >> available space in the largest area (Eden or Old) in your
> >> Java heap. As Kirk noted, the full gc was to attempt allocation
> >> of an object that didn't fit into the available space in
> >> Eden or in Old (so from that you can estimate the size of
> >> the request).
> >>
> >> Note also that the JDK libraries will resize hashtables under
> >> you and that can also cause large allocation requests
> >> (but i don't know how they handle OOM's resulting from such
> >> allocations).
> >>
> >> -- ramki
> >>
> >> On 06/02/11 09:56, Raman Gupta wrote:
> >>> I do tend to think that somewhere a large object or array is being
> >>> created. In particular, Infinispan is one library we are using that
> >>> may be allocating large chunks of memory -- indeed, replacing
> >>> Infinispan with a local cache does seem to "fix" the problem.
> >>>
> >>> However, more information from the JVM would really be useful in
> >>> isolating the offending code in Infinispan. Ideally,
> >>>
> >>> a) any large allocations should show up as part of the heap dump if
> >>> the allocation succeeded but then some other subsequent code caused
> >>> the OOM, or
> >>>
> >>> b) if the allocation itself failed, the OOM exception should include a
> >>> stack trace that would allow me to isolate the allocation point (as
> >>> it does normally, but for some reason in this case doesn't).
> >>>
> >>> In this case the heap dump shows plenty of room in heap, and there is
> >>> no stack trace at the OOM, so I don't really have any way to isolate
> >>> the offending allocation point. In which situations does the OOM
> >>> exception get printed without an associated stack trace?
> >>>
> >>> Cheers,
> >>> Raman
> >>>
> >>>
> >>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> >>>> are you trying to create a humungous object or array? Accidentally?
> >>>>
> >>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
> >>>>
> >>>>> I did check the database but didn't find anything relevant. My search
> >>>>> terms may not be optimal, though I did scan through all the results
> >>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> >>>>> "0K->0K".
> >>>>>
> >>>>> I also suspected a bug in the collector and so I tried the same test
> >>>>> with the G1 collector, with the same OOM result. I didn't save the
> >>>>> log
> >>>>> from the G1 test, but I can quite easily redo the test with any
> >>>>> set of
> >>>>> JVM parameters that may be helpful in debugging -- the OOM seems
> >>>>> to be
> >>>>> easily and consistently reproducible with this application.
> >>>>>
> >>>>> Cheers,
> >>>>> Raman
> >>>>>
> >>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> >>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
> >>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
> >>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
> >>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> >>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> >>>>>> GC overhead (%)
> >>>>>> Young generation: 2.65 (attempted to grow)
> >>>>>> Tenured generation: 0.54 (attempted to grow)
> >>>>>> Tenuring threshold: (attempted to decrease to balance GC
> >>>>>> costs) = 1
> >>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
> >>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
> >>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
> >>>>>> actions to meet *** throughput goal ***
> >>>>>> GC overhead (%)
> >>>>>> Young generation: 2.01 (attempted to grow)
> >>>>>> Tenured generation: 0.54 (no change)
> >>>>>> Tenuring threshold: (attempted to decrease to balance GC
> >>>>>> costs) = 1
> >>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
> >>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
> >>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
> >>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> >>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
> >>>>>> 5691K from being full. Nothing happening in Perm.
> >>>>>> The second is where things start to get weird. I don't see why
> >>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
> >>>>>> gc and yet no application thread allocated any memory out of
> >>>>>> young gen.
> >>>>>> for some reason that "failed" young gen collection triggers an
> >>>>>> immediate Full GC.
> >>>>>>
> >>>>>> Bug in the collector? Did you check the bug database?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Kirk
> >>>>>>
>
)
On 06/03/2011 03:28 AM, John Coomes wrote:
> Y. Srinivas Ramakrishna () wrote:
>> Sorry, sent previous email without addressing all of the issues.
>>
>> On 6/2/2011 6:15 PM, Raman Gupta wrote:
>>> It would be *really* handy if there were a switch like:
>>>
>>> -XX:+StackTraceOnOutOfMemoryError
>>
>> Yes that would be handy, and probably not too difficult.
>> But I wonder also if something like OnOutOfMemoryError or
>> like would already get you enough info to get close to
>> the problem ... (although may be because it's executed in
>> a separate shell, by the time the command executes the
>> process has likely gone well past the point when the problem
>> occurred).
>
> No need to worry, the OnOutOfMemoryError commands are run while the
> JVM is at a safepoint. This worked for me:
>
> java -XX:OnOutOfMemoryError='jstack %p' ...
>
> -John

Excellent -- will try this later today.

I did a quick search for places where OOME is caught and swallowed and
found a few places within the JDK (such as direct ByteBuffer
allocation), as well as a couple places in other libraries such as
commons-pool and jgroups, the latter of which is used by Infinispan
(though in some cases, but not all, those are logged before being
swallowed). In short though, I still don't definitively know where the
problem allocation is. So running jstack via OnOutOfMemoryError sounds
like it is just the ticket.

Cheers,
Raman
)

  #17  
03-06-2011 04:26 PM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)
I did check the database but didn't find anything relevant. My search
terms may not be optimal, though I did scan through all the results
returned by "java.lang.OutOfMemoryError: Java heap space" as well as
"0K->0K".

I also suspected a bug in the collector and so I tried the same test
with the G1 collector, with the same OOM result. I didn't save the log
from the G1 test, but I can quite easily redo the test with any set of
JVM parameters that may be helpful in debugging -- the OOM seems to be
easily and consistently reproducible with this application.

Cheers,
Raman

On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.65 (attempted to grow)
> Tenured generation: 0.54 (attempted to grow)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.01 (attempted to grow)
> Tenured generation: 0.54 (no change)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>
> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
> for some reason that "failed" young gen collection triggers an immediate Full GC.
>
> Bug in the collector? Did you check the bug database?
>
> Regards,
> Kirk
>
)
are you trying to create a humungous object or array? Accidentally?

On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:

> I did check the database but didn't find anything relevant. My search
> terms may not be optimal, though I did scan through all the results
> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> "0K->0K".
>
> I also suspected a bug in the collector and so I tried the same test
> with the G1 collector, with the same OOM result. I didn't save the log
> from the G1 test, but I can quite easily redo the test with any set of
> JVM parameters that may be helpful in debugging -- the OOM seems to be
> easily and consistently reproducible with this application.
>
> Cheers,
> Raman
>
> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.65 (attempted to grow)
>> Tenured generation: 0.54 (attempted to grow)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.01 (attempted to grow)
>> Tenured generation: 0.54 (no change)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>
>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>
>> Bug in the collector? Did you check the bug database?
>>
>> Regards,
>> Kirk
>>

)
I do tend to think that somewhere a large object or array is being
created. In particular, Infinispan is one library we are using that
may be allocating large chunks of memory -- indeed, replacing
Infinispan with a local cache does seem to "fix" the problem.

However, more information from the JVM would really be useful in
isolating the offending code in Infinispan. Ideally,

a) any large allocations should show up as part of the heap dump if
the allocation succeeded but then some other subsequent code caused
the OOM, or

b) if the allocation itself failed, the OOM exception should include a
stack trace that would allow me to isolate the allocation point (as
it does normally, but for some reason in this case doesn't).

In this case the heap dump shows plenty of room in heap, and there is
no stack trace at the OOM, so I don't really have any way to isolate
the offending allocation point. In which situations does the OOM
exception get printed without an associated stack trace?

Cheers,
Raman


On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> are you trying to create a humungous object or array? Accidentally?
>
> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>
>> I did check the database but didn't find anything relevant. My search
>> terms may not be optimal, though I did scan through all the results
>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>> "0K->0K".
>>
>> I also suspected a bug in the collector and so I tried the same test
>> with the G1 collector, with the same OOM result. I didn't save the log
>> from the G1 test, but I can quite easily redo the test with any set of
>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>> easily and consistently reproducible with this application.
>>
>> Cheers,
>> Raman
>>
>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.65 (attempted to grow)
>>> Tenured generation: 0.54 (attempted to grow)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.01 (attempted to grow)
>>> Tenured generation: 0.54 (no change)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>
>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>
>>> Bug in the collector? Did you check the bug database?
>>>
>>> Regards,
>>> Kirk
>>>
>
)
If your code is not catching the OOM exception you'd
expect to see the stack retrace when the program dies.
If it catches the exception and carries on, you'd want
it to print the exception detail. I don't know of
cases where the exception would just disappear.

In your case the report to stdout/stderr(?)that an OOM occurred and that
the heap is being dumped comes from inside the JVM
because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
After this point, your allocating thread would have gotten
an OOME which it probably caught and swallowed, and hence
the silence wrt the stack retrace you would normally see. You
will want to look at your Infinispan code to see how
it deals with the inability to allocate said large objects.

Recall that object size is limited by the size of and
available space in the largest area (Eden or Old) in your
Java heap. As Kirk noted, the full gc was to attempt allocation
of an object that didn't fit into the available space in
Eden or in Old (so from that you can estimate the size of
the request).

Note also that the JDK libraries will resize hashtables under
you and that can also cause large allocation requests
(but i don't know how they handle OOM's resulting from such
allocations).

-- ramki

On 06/02/11 09:56, Raman Gupta wrote:
> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
)
Well, GC is some what orthogonal to what you're application is up to except for this special case. I've cc'ed Manik in on this one maybe he's had someone run into it before.

Regards,
Kirk

On Jun 2, 2011, at 6:56 PM, Raman Gupta wrote:

> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
>>

)
It would be *really* handy if there were a switch like:

-XX:+StackTraceOnOutOfMemoryError

to force the stack trace to be shown. Obviously looking at every line
of code of every library my application uses, including core JDK
libraries, for code paths where large amounts of heap may be allocated
and the associated OOME is caught and swallowed, is pretty much
impossible.

I think my next step is to increase the max heap size to a large value
which hopefully allows the large allocation to occur without failure,
and then periodically take heap dumps to isolate it.

Thanks,
Raman

On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> If your code is not catching the OOM exception you'd
> expect to see the stack retrace when the program dies.
> If it catches the exception and carries on, you'd want
> it to print the exception detail. I don't know of
> cases where the exception would just disappear.
>
> In your case the report to stdout/stderr(?)that an OOM occurred and that
> the heap is being dumped comes from inside the JVM
> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> After this point, your allocating thread would have gotten
> an OOME which it probably caught and swallowed, and hence
> the silence wrt the stack retrace you would normally see. You
> will want to look at your Infinispan code to see how
> it deals with the inability to allocate said large objects.
>
> Recall that object size is limited by the size of and
> available space in the largest area (Eden or Old) in your
> Java heap. As Kirk noted, the full gc was to attempt allocation
> of an object that didn't fit into the available space in
> Eden or in Old (so from that you can estimate the size of
> the request).
>
> Note also that the JDK libraries will resize hashtables under
> you and that can also cause large allocation requests
> (but i don't know how they handle OOM's resulting from such
> allocations).
>
> -- ramki
>
> On 06/02/11 09:56, Raman Gupta wrote:
>> I do tend to think that somewhere a large object or array is being
>> created. In particular, Infinispan is one library we are using that
>> may be allocating large chunks of memory -- indeed, replacing
>> Infinispan with a local cache does seem to "fix" the problem.
>>
>> However, more information from the JVM would really be useful in
>> isolating the offending code in Infinispan. Ideally,
>>
>> a) any large allocations should show up as part of the heap dump if
>> the allocation succeeded but then some other subsequent code caused
>> the OOM, or
>>
>> b) if the allocation itself failed, the OOM exception should include a
>> stack trace that would allow me to isolate the allocation point (as
>> it does normally, but for some reason in this case doesn't).
>>
>> In this case the heap dump shows plenty of room in heap, and there is
>> no stack trace at the OOM, so I don't really have any way to isolate
>> the offending allocation point. In which situations does the OOM
>> exception get printed without an associated stack trace?
>>
>> Cheers,
>> Raman
>>
>>
>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>> are you trying to create a humungous object or array? Accidentally?
>>>
>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>
>>>> I did check the database but didn't find anything relevant. My search
>>>> terms may not be optimal, though I did scan through all the results
>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>> "0K->0K".
>>>>
>>>> I also suspected a bug in the collector and so I tried the same test
>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>> log
>>>> from the G1 test, but I can quite easily redo the test with any
>>>> set of
>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>> to be
>>>> easily and consistently reproducible with this application.
>>>>
>>>> Cheers,
>>>> Raman
>>>>
>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.65 (attempted to grow)
>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>> actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.01 (attempted to grow)
>>>>> Tenured generation: 0.54 (no change)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>> 5691K from being full. Nothing happening in Perm.
>>>>> The second is where things start to get weird. I don't see why
>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>> gc and yet no application thread allocated any memory out of
>>>>> young gen.
>>>>> for some reason that "failed" young gen collection triggers an
>>>>> immediate Full GC.
>>>>>
>>>>> Bug in the collector? Did you check the bug database?
>>>>>
>>>>> Regards,
>>>>> Kirk
>>>>>
)
You "just" have to find all the places where OOME is caught.
Hopefully there aren't too many of those?

-- ramki

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
Sorry, sent previous email without addressing all of the issues.

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError

Yes that would be handy, and probably not too difficult.
But I wonder also if something like OnOutOfMemoryError or
like would already get you enough info to get close to
the problem ... (although may be because it's executed in
a separate shell, by the time the command executes the
process has likely gone well past the point when the problem
occurred).

>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.

Try to look for places where OOME (or supertype?) is caught. I am
hoping there aren't too many of those...

>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.

Yes that seems reasonable, or may be use an allocation profiler
with the larger heap and find it that way...

-- ramki

>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
you should get a stack trace, something must be eating it.

Regards,
Kirk

On Jun 3, 2011, at 3:15 AM, Raman Gupta wrote:

> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
Y. Srinivas Ramakrishna () wrote:
> Sorry, sent previous email without addressing all of the issues.
>
> On 6/2/2011 6:15 PM, Raman Gupta wrote:
> > It would be *really* handy if there were a switch like:
> >
> > -XX:+StackTraceOnOutOfMemoryError
>
> Yes that would be handy, and probably not too difficult.
> But I wonder also if something like OnOutOfMemoryError or
> like would already get you enough info to get close to
> the problem ... (although may be because it's executed in
> a separate shell, by the time the command executes the
> process has likely gone well past the point when the problem
> occurred).

No need to worry, the OnOutOfMemoryError commands are run while the
JVM is at a safepoint. This worked for me:

java -XX:OnOutOfMemoryError='jstack %p' ...

-John

> > to force the stack trace to be shown. Obviously looking at every line
> > of code of every library my application uses, including core JDK
> > libraries, for code paths where large amounts of heap may be allocated
> > and the associated OOME is caught and swallowed, is pretty much
> > impossible.
>
> Try to look for places where OOME (or supertype?) is caught. I am
> hoping there aren't too many of those...
>
> >
> > I think my next step is to increase the max heap size to a large value
> > which hopefully allows the large allocation to occur without failure,
> > and then periodically take heap dumps to isolate it.
>
> Yes that seems reasonable, or may be use an allocation profiler
> with the larger heap and find it that way...
>
> -- ramki
>
> >
> > Thanks,
> > Raman
> >
> > On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> >> If your code is not catching the OOM exception you'd
> >> expect to see the stack retrace when the program dies.
> >> If it catches the exception and carries on, you'd want
> >> it to print the exception detail. I don't know of
> >> cases where the exception would just disappear.
> >>
> >> In your case the report to stdout/stderr(?)that an OOM occurred and that
> >> the heap is being dumped comes from inside the JVM
> >> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> >> After this point, your allocating thread would have gotten
> >> an OOME which it probably caught and swallowed, and hence
> >> the silence wrt the stack retrace you would normally see. You
> >> will want to look at your Infinispan code to see how
> >> it deals with the inability to allocate said large objects.
> >>
> >> Recall that object size is limited by the size of and
> >> available space in the largest area (Eden or Old) in your
> >> Java heap. As Kirk noted, the full gc was to attempt allocation
> >> of an object that didn't fit into the available space in
> >> Eden or in Old (so from that you can estimate the size of
> >> the request).
> >>
> >> Note also that the JDK libraries will resize hashtables under
> >> you and that can also cause large allocation requests
> >> (but i don't know how they handle OOM's resulting from such
> >> allocations).
> >>
> >> -- ramki
> >>
> >> On 06/02/11 09:56, Raman Gupta wrote:
> >>> I do tend to think that somewhere a large object or array is being
> >>> created. In particular, Infinispan is one library we are using that
> >>> may be allocating large chunks of memory -- indeed, replacing
> >>> Infinispan with a local cache does seem to "fix" the problem.
> >>>
> >>> However, more information from the JVM would really be useful in
> >>> isolating the offending code in Infinispan. Ideally,
> >>>
> >>> a) any large allocations should show up as part of the heap dump if
> >>> the allocation succeeded but then some other subsequent code caused
> >>> the OOM, or
> >>>
> >>> b) if the allocation itself failed, the OOM exception should include a
> >>> stack trace that would allow me to isolate the allocation point (as
> >>> it does normally, but for some reason in this case doesn't).
> >>>
> >>> In this case the heap dump shows plenty of room in heap, and there is
> >>> no stack trace at the OOM, so I don't really have any way to isolate
> >>> the offending allocation point. In which situations does the OOM
> >>> exception get printed without an associated stack trace?
> >>>
> >>> Cheers,
> >>> Raman
> >>>
> >>>
> >>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> >>>> are you trying to create a humungous object or array? Accidentally?
> >>>>
> >>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
> >>>>
> >>>>> I did check the database but didn't find anything relevant. My search
> >>>>> terms may not be optimal, though I did scan through all the results
> >>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> >>>>> "0K->0K".
> >>>>>
> >>>>> I also suspected a bug in the collector and so I tried the same test
> >>>>> with the G1 collector, with the same OOM result. I didn't save the
> >>>>> log
> >>>>> from the G1 test, but I can quite easily redo the test with any
> >>>>> set of
> >>>>> JVM parameters that may be helpful in debugging -- the OOM seems
> >>>>> to be
> >>>>> easily and consistently reproducible with this application.
> >>>>>
> >>>>> Cheers,
> >>>>> Raman
> >>>>>
> >>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> >>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
> >>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
> >>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
> >>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> >>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> >>>>>> GC overhead (%)
> >>>>>> Young generation: 2.65 (attempted to grow)
> >>>>>> Tenured generation: 0.54 (attempted to grow)
> >>>>>> Tenuring threshold: (attempted to decrease to balance GC
> >>>>>> costs) = 1
> >>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
> >>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
> >>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
> >>>>>> actions to meet *** throughput goal ***
> >>>>>> GC overhead (%)
> >>>>>> Young generation: 2.01 (attempted to grow)
> >>>>>> Tenured generation: 0.54 (no change)
> >>>>>> Tenuring threshold: (attempted to decrease to balance GC
> >>>>>> costs) = 1
> >>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
> >>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
> >>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
> >>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> >>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
> >>>>>> 5691K from being full. Nothing happening in Perm.
> >>>>>> The second is where things start to get weird. I don't see why
> >>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
> >>>>>> gc and yet no application thread allocated any memory out of
> >>>>>> young gen.
> >>>>>> for some reason that "failed" young gen collection triggers an
> >>>>>> immediate Full GC.
> >>>>>>
> >>>>>> Bug in the collector? Did you check the bug database?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Kirk
> >>>>>>
>
)
On 06/03/2011 03:28 AM, John Coomes wrote:
> Y. Srinivas Ramakrishna () wrote:
>> Sorry, sent previous email without addressing all of the issues.
>>
>> On 6/2/2011 6:15 PM, Raman Gupta wrote:
>>> It would be *really* handy if there were a switch like:
>>>
>>> -XX:+StackTraceOnOutOfMemoryError
>>
>> Yes that would be handy, and probably not too difficult.
>> But I wonder also if something like OnOutOfMemoryError or
>> like would already get you enough info to get close to
>> the problem ... (although may be because it's executed in
>> a separate shell, by the time the command executes the
>> process has likely gone well past the point when the problem
>> occurred).
>
> No need to worry, the OnOutOfMemoryError commands are run while the
> JVM is at a safepoint. This worked for me:
>
> java -XX:OnOutOfMemoryError='jstack %p' ...
>
> -John

Excellent -- will try this later today.

I did a quick search for places where OOME is caught and swallowed and
found a few places within the JDK (such as direct ByteBuffer
allocation), as well as a couple places in other libraries such as
commons-pool and jgroups, the latter of which is used by Infinispan
(though in some cases, but not all, those are logged before being
swallowed). In short though, I still don't definitively know where the
problem allocation is. So running jstack via OnOutOfMemoryError sounds
like it is just the ticket.

Cheers,
Raman
)
On 6/3/2011 7:52 AM, Raman Gupta wrote:
> On 06/03/2011 03:28 AM, John Coomes wrote:
>> Y. Srinivas Ramakrishna () wrote:
>>> Sorry, sent previous email without addressing all of the issues.
>>>
>>> On 6/2/2011 6:15 PM, Raman Gupta wrote:
>>>> It would be *really* handy if there were a switch like:
>>>>
>>>> -XX:+StackTraceOnOutOfMemoryError
>>>
>>> Yes that would be handy, and probably not too difficult.
>>> But I wonder also if something like OnOutOfMemoryError or
>>> like would already get you enough info to get close to
>>> the problem ... (although may be because it's executed in
>>> a separate shell, by the time the command executes the
>>> process has likely gone well past the point when the problem
>>> occurred).
>>
>> No need to worry, the OnOutOfMemoryError commands are run while the
>> JVM is at a safepoint. This worked for me:
>>
>> java -XX:OnOutOfMemoryError='jstack %p' ...

Really, are you sure? I'd assumed you spawn off a separate (i.e. asynchronous)
shell process rather than waiting for it to complete while you waited
in the safepoint (i.e. synchronous). It could still be that one is
"lucky" and the shell happens to complete before the safepoint is
exited? Anyway, a good idea to check the code to see if there is
a synchronicity guarantee or one relies on plain luck to sometimes
get something useful (which itself is not bad, but good to know
when it is good fortune vs actual design :-)

-- ramki

>>
>> -John
>
> Excellent -- will try this later today.
>
> I did a quick search for places where OOME is caught and swallowed and
> found a few places within the JDK (such as direct ByteBuffer
> allocation), as well as a couple places in other libraries such as
> commons-pool and jgroups, the latter of which is used by Infinispan
> (though in some cases, but not all, those are logged before being
> swallowed). In short though, I still don't definitively know where the
> problem allocation is. So running jstack via OnOutOfMemoryError sounds
> like it is just the ticket.
>
> Cheers,
> Raman

)

  #18  
03-06-2011 04:32 PM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)
I did check the database but didn't find anything relevant. My search
terms may not be optimal, though I did scan through all the results
returned by "java.lang.OutOfMemoryError: Java heap space" as well as
"0K->0K".

I also suspected a bug in the collector and so I tried the same test
with the G1 collector, with the same OOM result. I didn't save the log
from the G1 test, but I can quite easily redo the test with any set of
JVM parameters that may be helpful in debugging -- the OOM seems to be
easily and consistently reproducible with this application.

Cheers,
Raman

On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.65 (attempted to grow)
> Tenured generation: 0.54 (attempted to grow)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.01 (attempted to grow)
> Tenured generation: 0.54 (no change)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>
> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
> for some reason that "failed" young gen collection triggers an immediate Full GC.
>
> Bug in the collector? Did you check the bug database?
>
> Regards,
> Kirk
>
)
are you trying to create a humungous object or array? Accidentally?

On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:

> I did check the database but didn't find anything relevant. My search
> terms may not be optimal, though I did scan through all the results
> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> "0K->0K".
>
> I also suspected a bug in the collector and so I tried the same test
> with the G1 collector, with the same OOM result. I didn't save the log
> from the G1 test, but I can quite easily redo the test with any set of
> JVM parameters that may be helpful in debugging -- the OOM seems to be
> easily and consistently reproducible with this application.
>
> Cheers,
> Raman
>
> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.65 (attempted to grow)
>> Tenured generation: 0.54 (attempted to grow)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.01 (attempted to grow)
>> Tenured generation: 0.54 (no change)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>
>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>
>> Bug in the collector? Did you check the bug database?
>>
>> Regards,
>> Kirk
>>

)
I do tend to think that somewhere a large object or array is being
created. In particular, Infinispan is one library we are using that
may be allocating large chunks of memory -- indeed, replacing
Infinispan with a local cache does seem to "fix" the problem.

However, more information from the JVM would really be useful in
isolating the offending code in Infinispan. Ideally,

a) any large allocations should show up as part of the heap dump if
the allocation succeeded but then some other subsequent code caused
the OOM, or

b) if the allocation itself failed, the OOM exception should include a
stack trace that would allow me to isolate the allocation point (as
it does normally, but for some reason in this case doesn't).

In this case the heap dump shows plenty of room in heap, and there is
no stack trace at the OOM, so I don't really have any way to isolate
the offending allocation point. In which situations does the OOM
exception get printed without an associated stack trace?

Cheers,
Raman


On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> are you trying to create a humungous object or array? Accidentally?
>
> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>
>> I did check the database but didn't find anything relevant. My search
>> terms may not be optimal, though I did scan through all the results
>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>> "0K->0K".
>>
>> I also suspected a bug in the collector and so I tried the same test
>> with the G1 collector, with the same OOM result. I didn't save the log
>> from the G1 test, but I can quite easily redo the test with any set of
>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>> easily and consistently reproducible with this application.
>>
>> Cheers,
>> Raman
>>
>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.65 (attempted to grow)
>>> Tenured generation: 0.54 (attempted to grow)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.01 (attempted to grow)
>>> Tenured generation: 0.54 (no change)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>
>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>
>>> Bug in the collector? Did you check the bug database?
>>>
>>> Regards,
>>> Kirk
>>>
>
)
If your code is not catching the OOM exception you'd
expect to see the stack retrace when the program dies.
If it catches the exception and carries on, you'd want
it to print the exception detail. I don't know of
cases where the exception would just disappear.

In your case the report to stdout/stderr(?)that an OOM occurred and that
the heap is being dumped comes from inside the JVM
because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
After this point, your allocating thread would have gotten
an OOME which it probably caught and swallowed, and hence
the silence wrt the stack retrace you would normally see. You
will want to look at your Infinispan code to see how
it deals with the inability to allocate said large objects.

Recall that object size is limited by the size of and
available space in the largest area (Eden or Old) in your
Java heap. As Kirk noted, the full gc was to attempt allocation
of an object that didn't fit into the available space in
Eden or in Old (so from that you can estimate the size of
the request).

Note also that the JDK libraries will resize hashtables under
you and that can also cause large allocation requests
(but i don't know how they handle OOM's resulting from such
allocations).

-- ramki

On 06/02/11 09:56, Raman Gupta wrote:
> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
)
Well, GC is some what orthogonal to what you're application is up to except for this special case. I've cc'ed Manik in on this one maybe he's had someone run into it before.

Regards,
Kirk

On Jun 2, 2011, at 6:56 PM, Raman Gupta wrote:

> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
>>

)
It would be *really* handy if there were a switch like:

-XX:+StackTraceOnOutOfMemoryError

to force the stack trace to be shown. Obviously looking at every line
of code of every library my application uses, including core JDK
libraries, for code paths where large amounts of heap may be allocated
and the associated OOME is caught and swallowed, is pretty much
impossible.

I think my next step is to increase the max heap size to a large value
which hopefully allows the large allocation to occur without failure,
and then periodically take heap dumps to isolate it.

Thanks,
Raman

On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> If your code is not catching the OOM exception you'd
> expect to see the stack retrace when the program dies.
> If it catches the exception and carries on, you'd want
> it to print the exception detail. I don't know of
> cases where the exception would just disappear.
>
> In your case the report to stdout/stderr(?)that an OOM occurred and that
> the heap is being dumped comes from inside the JVM
> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> After this point, your allocating thread would have gotten
> an OOME which it probably caught and swallowed, and hence
> the silence wrt the stack retrace you would normally see. You
> will want to look at your Infinispan code to see how
> it deals with the inability to allocate said large objects.
>
> Recall that object size is limited by the size of and
> available space in the largest area (Eden or Old) in your
> Java heap. As Kirk noted, the full gc was to attempt allocation
> of an object that didn't fit into the available space in
> Eden or in Old (so from that you can estimate the size of
> the request).
>
> Note also that the JDK libraries will resize hashtables under
> you and that can also cause large allocation requests
> (but i don't know how they handle OOM's resulting from such
> allocations).
>
> -- ramki
>
> On 06/02/11 09:56, Raman Gupta wrote:
>> I do tend to think that somewhere a large object or array is being
>> created. In particular, Infinispan is one library we are using that
>> may be allocating large chunks of memory -- indeed, replacing
>> Infinispan with a local cache does seem to "fix" the problem.
>>
>> However, more information from the JVM would really be useful in
>> isolating the offending code in Infinispan. Ideally,
>>
>> a) any large allocations should show up as part of the heap dump if
>> the allocation succeeded but then some other subsequent code caused
>> the OOM, or
>>
>> b) if the allocation itself failed, the OOM exception should include a
>> stack trace that would allow me to isolate the allocation point (as
>> it does normally, but for some reason in this case doesn't).
>>
>> In this case the heap dump shows plenty of room in heap, and there is
>> no stack trace at the OOM, so I don't really have any way to isolate
>> the offending allocation point. In which situations does the OOM
>> exception get printed without an associated stack trace?
>>
>> Cheers,
>> Raman
>>
>>
>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>> are you trying to create a humungous object or array? Accidentally?
>>>
>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>
>>>> I did check the database but didn't find anything relevant. My search
>>>> terms may not be optimal, though I did scan through all the results
>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>> "0K->0K".
>>>>
>>>> I also suspected a bug in the collector and so I tried the same test
>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>> log
>>>> from the G1 test, but I can quite easily redo the test with any
>>>> set of
>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>> to be
>>>> easily and consistently reproducible with this application.
>>>>
>>>> Cheers,
>>>> Raman
>>>>
>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.65 (attempted to grow)
>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>> actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.01 (attempted to grow)
>>>>> Tenured generation: 0.54 (no change)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>> 5691K from being full. Nothing happening in Perm.
>>>>> The second is where things start to get weird. I don't see why
>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>> gc and yet no application thread allocated any memory out of
>>>>> young gen.
>>>>> for some reason that "failed" young gen collection triggers an
>>>>> immediate Full GC.
>>>>>
>>>>> Bug in the collector? Did you check the bug database?
>>>>>
>>>>> Regards,
>>>>> Kirk
>>>>>
)
You "just" have to find all the places where OOME is caught.
Hopefully there aren't too many of those?

-- ramki

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
Sorry, sent previous email without addressing all of the issues.

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError

Yes that would be handy, and probably not too difficult.
But I wonder also if something like OnOutOfMemoryError or
like would already get you enough info to get close to
the problem ... (although may be because it's executed in
a separate shell, by the time the command executes the
process has likely gone well past the point when the problem
occurred).

>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.

Try to look for places where OOME (or supertype?) is caught. I am
hoping there aren't too many of those...

>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.

Yes that seems reasonable, or may be use an allocation profiler
with the larger heap and find it that way...

-- ramki

>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
you should get a stack trace, something must be eating it.

Regards,
Kirk

On Jun 3, 2011, at 3:15 AM, Raman Gupta wrote:

> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
Y. Srinivas Ramakrishna () wrote:
> Sorry, sent previous email without addressing all of the issues.
>
> On 6/2/2011 6:15 PM, Raman Gupta wrote:
> > It would be *really* handy if there were a switch like:
> >
> > -XX:+StackTraceOnOutOfMemoryError
>
> Yes that would be handy, and probably not too difficult.
> But I wonder also if something like OnOutOfMemoryError or
> like would already get you enough info to get close to
> the problem ... (although may be because it's executed in
> a separate shell, by the time the command executes the
> process has likely gone well past the point when the problem
> occurred).

No need to worry, the OnOutOfMemoryError commands are run while the
JVM is at a safepoint. This worked for me:

java -XX:OnOutOfMemoryError='jstack %p' ...

-John

> > to force the stack trace to be shown. Obviously looking at every line
> > of code of every library my application uses, including core JDK
> > libraries, for code paths where large amounts of heap may be allocated
> > and the associated OOME is caught and swallowed, is pretty much
> > impossible.
>
> Try to look for places where OOME (or supertype?) is caught. I am
> hoping there aren't too many of those...
>
> >
> > I think my next step is to increase the max heap size to a large value
> > which hopefully allows the large allocation to occur without failure,
> > and then periodically take heap dumps to isolate it.
>
> Yes that seems reasonable, or may be use an allocation profiler
> with the larger heap and find it that way...
>
> -- ramki
>
> >
> > Thanks,
> > Raman
> >
> > On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> >> If your code is not catching the OOM exception you'd
> >> expect to see the stack retrace when the program dies.
> >> If it catches the exception and carries on, you'd want
> >> it to print the exception detail. I don't know of
> >> cases where the exception would just disappear.
> >>
> >> In your case the report to stdout/stderr(?)that an OOM occurred and that
> >> the heap is being dumped comes from inside the JVM
> >> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> >> After this point, your allocating thread would have gotten
> >> an OOME which it probably caught and swallowed, and hence
> >> the silence wrt the stack retrace you would normally see. You
> >> will want to look at your Infinispan code to see how
> >> it deals with the inability to allocate said large objects.
> >>
> >> Recall that object size is limited by the size of and
> >> available space in the largest area (Eden or Old) in your
> >> Java heap. As Kirk noted, the full gc was to attempt allocation
> >> of an object that didn't fit into the available space in
> >> Eden or in Old (so from that you can estimate the size of
> >> the request).
> >>
> >> Note also that the JDK libraries will resize hashtables under
> >> you and that can also cause large allocation requests
> >> (but i don't know how they handle OOM's resulting from such
> >> allocations).
> >>
> >> -- ramki
> >>
> >> On 06/02/11 09:56, Raman Gupta wrote:
> >>> I do tend to think that somewhere a large object or array is being
> >>> created. In particular, Infinispan is one library we are using that
> >>> may be allocating large chunks of memory -- indeed, replacing
> >>> Infinispan with a local cache does seem to "fix" the problem.
> >>>
> >>> However, more information from the JVM would really be useful in
> >>> isolating the offending code in Infinispan. Ideally,
> >>>
> >>> a) any large allocations should show up as part of the heap dump if
> >>> the allocation succeeded but then some other subsequent code caused
> >>> the OOM, or
> >>>
> >>> b) if the allocation itself failed, the OOM exception should include a
> >>> stack trace that would allow me to isolate the allocation point (as
> >>> it does normally, but for some reason in this case doesn't).
> >>>
> >>> In this case the heap dump shows plenty of room in heap, and there is
> >>> no stack trace at the OOM, so I don't really have any way to isolate
> >>> the offending allocation point. In which situations does the OOM
> >>> exception get printed without an associated stack trace?
> >>>
> >>> Cheers,
> >>> Raman
> >>>
> >>>
> >>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> >>>> are you trying to create a humungous object or array? Accidentally?
> >>>>
> >>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
> >>>>
> >>>>> I did check the database but didn't find anything relevant. My search
> >>>>> terms may not be optimal, though I did scan through all the results
> >>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> >>>>> "0K->0K".
> >>>>>
> >>>>> I also suspected a bug in the collector and so I tried the same test
> >>>>> with the G1 collector, with the same OOM result. I didn't save the
> >>>>> log
> >>>>> from the G1 test, but I can quite easily redo the test with any
> >>>>> set of
> >>>>> JVM parameters that may be helpful in debugging -- the OOM seems
> >>>>> to be
> >>>>> easily and consistently reproducible with this application.
> >>>>>
> >>>>> Cheers,
> >>>>> Raman
> >>>>>
> >>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> >>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
> >>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
> >>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
> >>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> >>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> >>>>>> GC overhead (%)
> >>>>>> Young generation: 2.65 (attempted to grow)
> >>>>>> Tenured generation: 0.54 (attempted to grow)
> >>>>>> Tenuring threshold: (attempted to decrease to balance GC
> >>>>>> costs) = 1
> >>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
> >>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
> >>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
> >>>>>> actions to meet *** throughput goal ***
> >>>>>> GC overhead (%)
> >>>>>> Young generation: 2.01 (attempted to grow)
> >>>>>> Tenured generation: 0.54 (no change)
> >>>>>> Tenuring threshold: (attempted to decrease to balance GC
> >>>>>> costs) = 1
> >>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
> >>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
> >>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
> >>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> >>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
> >>>>>> 5691K from being full. Nothing happening in Perm.
> >>>>>> The second is where things start to get weird. I don't see why
> >>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
> >>>>>> gc and yet no application thread allocated any memory out of
> >>>>>> young gen.
> >>>>>> for some reason that "failed" young gen collection triggers an
> >>>>>> immediate Full GC.
> >>>>>>
> >>>>>> Bug in the collector? Did you check the bug database?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Kirk
> >>>>>>
>
)
On 06/03/2011 03:28 AM, John Coomes wrote:
> Y. Srinivas Ramakrishna () wrote:
>> Sorry, sent previous email without addressing all of the issues.
>>
>> On 6/2/2011 6:15 PM, Raman Gupta wrote:
>>> It would be *really* handy if there were a switch like:
>>>
>>> -XX:+StackTraceOnOutOfMemoryError
>>
>> Yes that would be handy, and probably not too difficult.
>> But I wonder also if something like OnOutOfMemoryError or
>> like would already get you enough info to get close to
>> the problem ... (although may be because it's executed in
>> a separate shell, by the time the command executes the
>> process has likely gone well past the point when the problem
>> occurred).
>
> No need to worry, the OnOutOfMemoryError commands are run while the
> JVM is at a safepoint. This worked for me:
>
> java -XX:OnOutOfMemoryError='jstack %p' ...
>
> -John

Excellent -- will try this later today.

I did a quick search for places where OOME is caught and swallowed and
found a few places within the JDK (such as direct ByteBuffer
allocation), as well as a couple places in other libraries such as
commons-pool and jgroups, the latter of which is used by Infinispan
(though in some cases, but not all, those are logged before being
swallowed). In short though, I still don't definitively know where the
problem allocation is. So running jstack via OnOutOfMemoryError sounds
like it is just the ticket.

Cheers,
Raman
)
On 6/3/2011 7:52 AM, Raman Gupta wrote:
> On 06/03/2011 03:28 AM, John Coomes wrote:
>> Y. Srinivas Ramakrishna () wrote:
>>> Sorry, sent previous email without addressing all of the issues.
>>>
>>> On 6/2/2011 6:15 PM, Raman Gupta wrote:
>>>> It would be *really* handy if there were a switch like:
>>>>
>>>> -XX:+StackTraceOnOutOfMemoryError
>>>
>>> Yes that would be handy, and probably not too difficult.
>>> But I wonder also if something like OnOutOfMemoryError or
>>> like would already get you enough info to get close to
>>> the problem ... (although may be because it's executed in
>>> a separate shell, by the time the command executes the
>>> process has likely gone well past the point when the problem
>>> occurred).
>>
>> No need to worry, the OnOutOfMemoryError commands are run while the
>> JVM is at a safepoint. This worked for me:
>>
>> java -XX:OnOutOfMemoryError='jstack %p' ...

Really, are you sure? I'd assumed you spawn off a separate (i.e. asynchronous)
shell process rather than waiting for it to complete while you waited
in the safepoint (i.e. synchronous). It could still be that one is
"lucky" and the shell happens to complete before the safepoint is
exited? Anyway, a good idea to check the code to see if there is
a synchronicity guarantee or one relies on plain luck to sometimes
get something useful (which itself is not bad, but good to know
when it is good fortune vs actual design :-)

-- ramki

>>
>> -John
>
> Excellent -- will try this later today.
>
> I did a quick search for places where OOME is caught and swallowed and
> found a few places within the JDK (such as direct ByteBuffer
> allocation), as well as a couple places in other libraries such as
> commons-pool and jgroups, the latter of which is used by Infinispan
> (though in some cases, but not all, those are logged before being
> swallowed). In short though, I still don't definitively know where the
> problem allocation is. So running jstack via OnOutOfMemoryError sounds
> like it is just the ticket.
>
> Cheers,
> Raman

)
By the way, it would seem that a "safepoint synchronous"
OnOutOfMemoryError execution would restrict what you could do,
just in case that caused a deadlock because the target (self)
might need to be at a non-safepoint to react to that command....

Is there such a documented restriction on what commands can
be run within OnOutOfMemeoryError (or even a flat caveat emptor)?

-- ramki

On 6/3/2011 8:26 AM, Y. Srinivas Ramakrishna wrote:
> On 6/3/2011 7:52 AM, Raman Gupta wrote:
>> On 06/03/2011 03:28 AM, John Coomes wrote:
>>> Y. Srinivas Ramakrishna () wrote:
>>>> Sorry, sent previous email without addressing all of the issues.
>>>>
>>>> On 6/2/2011 6:15 PM, Raman Gupta wrote:
>>>>> It would be *really* handy if there were a switch like:
>>>>>
>>>>> -XX:+StackTraceOnOutOfMemoryError
>>>>
>>>> Yes that would be handy, and probably not too difficult.
>>>> But I wonder also if something like OnOutOfMemoryError or
>>>> like would already get you enough info to get close to
>>>> the problem ... (although may be because it's executed in
>>>> a separate shell, by the time the command executes the
>>>> process has likely gone well past the point when the problem
>>>> occurred).
>>>
>>> No need to worry, the OnOutOfMemoryError commands are run while the
>>> JVM is at a safepoint. This worked for me:
>>>
>>> java -XX:OnOutOfMemoryError='jstack %p' ...
>
> Really, are you sure? I'd assumed you spawn off a separate (i.e. asynchronous)
> shell process rather than waiting for it to complete while you waited
> in the safepoint (i.e. synchronous). It could still be that one is
> "lucky" and the shell happens to complete before the safepoint is
> exited? Anyway, a good idea to check the code to see if there is
> a synchronicity guarantee or one relies on plain luck to sometimes
> get something useful (which itself is not bad, but good to know
> when it is good fortune vs actual design :-)
>
> -- ramki
>
>>>
>>> -John
>>
>> Excellent -- will try this later today.
>>
>> I did a quick search for places where OOME is caught and swallowed and
>> found a few places within the JDK (such as direct ByteBuffer
>> allocation), as well as a couple places in other libraries such as
>> commons-pool and jgroups, the latter of which is used by Infinispan
>> (though in some cases, but not all, those are logged before being
>> swallowed). In short though, I still don't definitively know where the
>> problem allocation is. So running jstack via OnOutOfMemoryError sounds
>> like it is just the ticket.
>>
>> Cheers,
>> Raman
>

)

  #19  
03-06-2011 04:50 PM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)
I did check the database but didn't find anything relevant. My search
terms may not be optimal, though I did scan through all the results
returned by "java.lang.OutOfMemoryError: Java heap space" as well as
"0K->0K".

I also suspected a bug in the collector and so I tried the same test
with the G1 collector, with the same OOM result. I didn't save the log
from the G1 test, but I can quite easily redo the test with any set of
JVM parameters that may be helpful in debugging -- the OOM seems to be
easily and consistently reproducible with this application.

Cheers,
Raman

On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.65 (attempted to grow)
> Tenured generation: 0.54 (attempted to grow)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.01 (attempted to grow)
> Tenured generation: 0.54 (no change)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>
> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
> for some reason that "failed" young gen collection triggers an immediate Full GC.
>
> Bug in the collector? Did you check the bug database?
>
> Regards,
> Kirk
>
)
are you trying to create a humungous object or array? Accidentally?

On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:

> I did check the database but didn't find anything relevant. My search
> terms may not be optimal, though I did scan through all the results
> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> "0K->0K".
>
> I also suspected a bug in the collector and so I tried the same test
> with the G1 collector, with the same OOM result. I didn't save the log
> from the G1 test, but I can quite easily redo the test with any set of
> JVM parameters that may be helpful in debugging -- the OOM seems to be
> easily and consistently reproducible with this application.
>
> Cheers,
> Raman
>
> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.65 (attempted to grow)
>> Tenured generation: 0.54 (attempted to grow)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.01 (attempted to grow)
>> Tenured generation: 0.54 (no change)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>
>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>
>> Bug in the collector? Did you check the bug database?
>>
>> Regards,
>> Kirk
>>

)
I do tend to think that somewhere a large object or array is being
created. In particular, Infinispan is one library we are using that
may be allocating large chunks of memory -- indeed, replacing
Infinispan with a local cache does seem to "fix" the problem.

However, more information from the JVM would really be useful in
isolating the offending code in Infinispan. Ideally,

a) any large allocations should show up as part of the heap dump if
the allocation succeeded but then some other subsequent code caused
the OOM, or

b) if the allocation itself failed, the OOM exception should include a
stack trace that would allow me to isolate the allocation point (as
it does normally, but for some reason in this case doesn't).

In this case the heap dump shows plenty of room in heap, and there is
no stack trace at the OOM, so I don't really have any way to isolate
the offending allocation point. In which situations does the OOM
exception get printed without an associated stack trace?

Cheers,
Raman


On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> are you trying to create a humungous object or array? Accidentally?
>
> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>
>> I did check the database but didn't find anything relevant. My search
>> terms may not be optimal, though I did scan through all the results
>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>> "0K->0K".
>>
>> I also suspected a bug in the collector and so I tried the same test
>> with the G1 collector, with the same OOM result. I didn't save the log
>> from the G1 test, but I can quite easily redo the test with any set of
>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>> easily and consistently reproducible with this application.
>>
>> Cheers,
>> Raman
>>
>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.65 (attempted to grow)
>>> Tenured generation: 0.54 (attempted to grow)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.01 (attempted to grow)
>>> Tenured generation: 0.54 (no change)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>
>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>
>>> Bug in the collector? Did you check the bug database?
>>>
>>> Regards,
>>> Kirk
>>>
>
)
If your code is not catching the OOM exception you'd
expect to see the stack retrace when the program dies.
If it catches the exception and carries on, you'd want
it to print the exception detail. I don't know of
cases where the exception would just disappear.

In your case the report to stdout/stderr(?)that an OOM occurred and that
the heap is being dumped comes from inside the JVM
because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
After this point, your allocating thread would have gotten
an OOME which it probably caught and swallowed, and hence
the silence wrt the stack retrace you would normally see. You
will want to look at your Infinispan code to see how
it deals with the inability to allocate said large objects.

Recall that object size is limited by the size of and
available space in the largest area (Eden or Old) in your
Java heap. As Kirk noted, the full gc was to attempt allocation
of an object that didn't fit into the available space in
Eden or in Old (so from that you can estimate the size of
the request).

Note also that the JDK libraries will resize hashtables under
you and that can also cause large allocation requests
(but i don't know how they handle OOM's resulting from such
allocations).

-- ramki

On 06/02/11 09:56, Raman Gupta wrote:
> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
)
Well, GC is some what orthogonal to what you're application is up to except for this special case. I've cc'ed Manik in on this one maybe he's had someone run into it before.

Regards,
Kirk

On Jun 2, 2011, at 6:56 PM, Raman Gupta wrote:

> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
>>

)
It would be *really* handy if there were a switch like:

-XX:+StackTraceOnOutOfMemoryError

to force the stack trace to be shown. Obviously looking at every line
of code of every library my application uses, including core JDK
libraries, for code paths where large amounts of heap may be allocated
and the associated OOME is caught and swallowed, is pretty much
impossible.

I think my next step is to increase the max heap size to a large value
which hopefully allows the large allocation to occur without failure,
and then periodically take heap dumps to isolate it.

Thanks,
Raman

On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> If your code is not catching the OOM exception you'd
> expect to see the stack retrace when the program dies.
> If it catches the exception and carries on, you'd want
> it to print the exception detail. I don't know of
> cases where the exception would just disappear.
>
> In your case the report to stdout/stderr(?)that an OOM occurred and that
> the heap is being dumped comes from inside the JVM
> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> After this point, your allocating thread would have gotten
> an OOME which it probably caught and swallowed, and hence
> the silence wrt the stack retrace you would normally see. You
> will want to look at your Infinispan code to see how
> it deals with the inability to allocate said large objects.
>
> Recall that object size is limited by the size of and
> available space in the largest area (Eden or Old) in your
> Java heap. As Kirk noted, the full gc was to attempt allocation
> of an object that didn't fit into the available space in
> Eden or in Old (so from that you can estimate the size of
> the request).
>
> Note also that the JDK libraries will resize hashtables under
> you and that can also cause large allocation requests
> (but i don't know how they handle OOM's resulting from such
> allocations).
>
> -- ramki
>
> On 06/02/11 09:56, Raman Gupta wrote:
>> I do tend to think that somewhere a large object or array is being
>> created. In particular, Infinispan is one library we are using that
>> may be allocating large chunks of memory -- indeed, replacing
>> Infinispan with a local cache does seem to "fix" the problem.
>>
>> However, more information from the JVM would really be useful in
>> isolating the offending code in Infinispan. Ideally,
>>
>> a) any large allocations should show up as part of the heap dump if
>> the allocation succeeded but then some other subsequent code caused
>> the OOM, or
>>
>> b) if the allocation itself failed, the OOM exception should include a
>> stack trace that would allow me to isolate the allocation point (as
>> it does normally, but for some reason in this case doesn't).
>>
>> In this case the heap dump shows plenty of room in heap, and there is
>> no stack trace at the OOM, so I don't really have any way to isolate
>> the offending allocation point. In which situations does the OOM
>> exception get printed without an associated stack trace?
>>
>> Cheers,
>> Raman
>>
>>
>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>> are you trying to create a humungous object or array? Accidentally?
>>>
>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>
>>>> I did check the database but didn't find anything relevant. My search
>>>> terms may not be optimal, though I did scan through all the results
>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>> "0K->0K".
>>>>
>>>> I also suspected a bug in the collector and so I tried the same test
>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>> log
>>>> from the G1 test, but I can quite easily redo the test with any
>>>> set of
>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>> to be
>>>> easily and consistently reproducible with this application.
>>>>
>>>> Cheers,
>>>> Raman
>>>>
>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.65 (attempted to grow)
>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>> actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.01 (attempted to grow)
>>>>> Tenured generation: 0.54 (no change)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>> 5691K from being full. Nothing happening in Perm.
>>>>> The second is where things start to get weird. I don't see why
>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>> gc and yet no application thread allocated any memory out of
>>>>> young gen.
>>>>> for some reason that "failed" young gen collection triggers an
>>>>> immediate Full GC.
>>>>>
>>>>> Bug in the collector? Did you check the bug database?
>>>>>
>>>>> Regards,
>>>>> Kirk
>>>>>
)
You "just" have to find all the places where OOME is caught.
Hopefully there aren't too many of those?

-- ramki

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
Sorry, sent previous email without addressing all of the issues.

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError

Yes that would be handy, and probably not too difficult.
But I wonder also if something like OnOutOfMemoryError or
like would already get you enough info to get close to
the problem ... (although may be because it's executed in
a separate shell, by the time the command executes the
process has likely gone well past the point when the problem
occurred).

>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.

Try to look for places where OOME (or supertype?) is caught. I am
hoping there aren't too many of those...

>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.

Yes that seems reasonable, or may be use an allocation profiler
with the larger heap and find it that way...

-- ramki

>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
you should get a stack trace, something must be eating it.

Regards,
Kirk

On Jun 3, 2011, at 3:15 AM, Raman Gupta wrote:

> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
Y. Srinivas Ramakrishna () wrote:
> Sorry, sent previous email without addressing all of the issues.
>
> On 6/2/2011 6:15 PM, Raman Gupta wrote:
> > It would be *really* handy if there were a switch like:
> >
> > -XX:+StackTraceOnOutOfMemoryError
>
> Yes that would be handy, and probably not too difficult.
> But I wonder also if something like OnOutOfMemoryError or
> like would already get you enough info to get close to
> the problem ... (although may be because it's executed in
> a separate shell, by the time the command executes the
> process has likely gone well past the point when the problem
> occurred).

No need to worry, the OnOutOfMemoryError commands are run while the
JVM is at a safepoint. This worked for me:

java -XX:OnOutOfMemoryError='jstack %p' ...

-John

> > to force the stack trace to be shown. Obviously looking at every line
> > of code of every library my application uses, including core JDK
> > libraries, for code paths where large amounts of heap may be allocated
> > and the associated OOME is caught and swallowed, is pretty much
> > impossible.
>
> Try to look for places where OOME (or supertype?) is caught. I am
> hoping there aren't too many of those...
>
> >
> > I think my next step is to increase the max heap size to a large value
> > which hopefully allows the large allocation to occur without failure,
> > and then periodically take heap dumps to isolate it.
>
> Yes that seems reasonable, or may be use an allocation profiler
> with the larger heap and find it that way...
>
> -- ramki
>
> >
> > Thanks,
> > Raman
> >
> > On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> >> If your code is not catching the OOM exception you'd
> >> expect to see the stack retrace when the program dies.
> >> If it catches the exception and carries on, you'd want
> >> it to print the exception detail. I don't know of
> >> cases where the exception would just disappear.
> >>
> >> In your case the report to stdout/stderr(?)that an OOM occurred and that
> >> the heap is being dumped comes from inside the JVM
> >> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> >> After this point, your allocating thread would have gotten
> >> an OOME which it probably caught and swallowed, and hence
> >> the silence wrt the stack retrace you would normally see. You
> >> will want to look at your Infinispan code to see how
> >> it deals with the inability to allocate said large objects.
> >>
> >> Recall that object size is limited by the size of and
> >> available space in the largest area (Eden or Old) in your
> >> Java heap. As Kirk noted, the full gc was to attempt allocation
> >> of an object that didn't fit into the available space in
> >> Eden or in Old (so from that you can estimate the size of
> >> the request).
> >>
> >> Note also that the JDK libraries will resize hashtables under
> >> you and that can also cause large allocation requests
> >> (but i don't know how they handle OOM's resulting from such
> >> allocations).
> >>
> >> -- ramki
> >>
> >> On 06/02/11 09:56, Raman Gupta wrote:
> >>> I do tend to think that somewhere a large object or array is being
> >>> created. In particular, Infinispan is one library we are using that
> >>> may be allocating large chunks of memory -- indeed, replacing
> >>> Infinispan with a local cache does seem to "fix" the problem.
> >>>
> >>> However, more information from the JVM would really be useful in
> >>> isolating the offending code in Infinispan. Ideally,
> >>>
> >>> a) any large allocations should show up as part of the heap dump if
> >>> the allocation succeeded but then some other subsequent code caused
> >>> the OOM, or
> >>>
> >>> b) if the allocation itself failed, the OOM exception should include a
> >>> stack trace that would allow me to isolate the allocation point (as
> >>> it does normally, but for some reason in this case doesn't).
> >>>
> >>> In this case the heap dump shows plenty of room in heap, and there is
> >>> no stack trace at the OOM, so I don't really have any way to isolate
> >>> the offending allocation point. In which situations does the OOM
> >>> exception get printed without an associated stack trace?
> >>>
> >>> Cheers,
> >>> Raman
> >>>
> >>>
> >>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> >>>> are you trying to create a humungous object or array? Accidentally?
> >>>>
> >>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
> >>>>
> >>>>> I did check the database but didn't find anything relevant. My search
> >>>>> terms may not be optimal, though I did scan through all the results
> >>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> >>>>> "0K->0K".
> >>>>>
> >>>>> I also suspected a bug in the collector and so I tried the same test
> >>>>> with the G1 collector, with the same OOM result. I didn't save the
> >>>>> log
> >>>>> from the G1 test, but I can quite easily redo the test with any
> >>>>> set of
> >>>>> JVM parameters that may be helpful in debugging -- the OOM seems
> >>>>> to be
> >>>>> easily and consistently reproducible with this application.
> >>>>>
> >>>>> Cheers,
> >>>>> Raman
> >>>>>
> >>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> >>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
> >>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
> >>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
> >>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> >>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> >>>>>> GC overhead (%)
> >>>>>> Young generation: 2.65 (attempted to grow)
> >>>>>> Tenured generation: 0.54 (attempted to grow)
> >>>>>> Tenuring threshold: (attempted to decrease to balance GC
> >>>>>> costs) = 1
> >>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
> >>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
> >>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
> >>>>>> actions to meet *** throughput goal ***
> >>>>>> GC overhead (%)
> >>>>>> Young generation: 2.01 (attempted to grow)
> >>>>>> Tenured generation: 0.54 (no change)
> >>>>>> Tenuring threshold: (attempted to decrease to balance GC
> >>>>>> costs) = 1
> >>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
> >>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
> >>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
> >>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> >>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
> >>>>>> 5691K from being full. Nothing happening in Perm.
> >>>>>> The second is where things start to get weird. I don't see why
> >>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
> >>>>>> gc and yet no application thread allocated any memory out of
> >>>>>> young gen.
> >>>>>> for some reason that "failed" young gen collection triggers an
> >>>>>> immediate Full GC.
> >>>>>>
> >>>>>> Bug in the collector? Did you check the bug database?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Kirk
> >>>>>>
>
)
On 06/03/2011 03:28 AM, John Coomes wrote:
> Y. Srinivas Ramakrishna () wrote:
>> Sorry, sent previous email without addressing all of the issues.
>>
>> On 6/2/2011 6:15 PM, Raman Gupta wrote:
>>> It would be *really* handy if there were a switch like:
>>>
>>> -XX:+StackTraceOnOutOfMemoryError
>>
>> Yes that would be handy, and probably not too difficult.
>> But I wonder also if something like OnOutOfMemoryError or
>> like would already get you enough info to get close to
>> the problem ... (although may be because it's executed in
>> a separate shell, by the time the command executes the
>> process has likely gone well past the point when the problem
>> occurred).
>
> No need to worry, the OnOutOfMemoryError commands are run while the
> JVM is at a safepoint. This worked for me:
>
> java -XX:OnOutOfMemoryError='jstack %p' ...
>
> -John

Excellent -- will try this later today.

I did a quick search for places where OOME is caught and swallowed and
found a few places within the JDK (such as direct ByteBuffer
allocation), as well as a couple places in other libraries such as
commons-pool and jgroups, the latter of which is used by Infinispan
(though in some cases, but not all, those are logged before being
swallowed). In short though, I still don't definitively know where the
problem allocation is. So running jstack via OnOutOfMemoryError sounds
like it is just the ticket.

Cheers,
Raman
)
On 6/3/2011 7:52 AM, Raman Gupta wrote:
> On 06/03/2011 03:28 AM, John Coomes wrote:
>> Y. Srinivas Ramakrishna () wrote:
>>> Sorry, sent previous email without addressing all of the issues.
>>>
>>> On 6/2/2011 6:15 PM, Raman Gupta wrote:
>>>> It would be *really* handy if there were a switch like:
>>>>
>>>> -XX:+StackTraceOnOutOfMemoryError
>>>
>>> Yes that would be handy, and probably not too difficult.
>>> But I wonder also if something like OnOutOfMemoryError or
>>> like would already get you enough info to get close to
>>> the problem ... (although may be because it's executed in
>>> a separate shell, by the time the command executes the
>>> process has likely gone well past the point when the problem
>>> occurred).
>>
>> No need to worry, the OnOutOfMemoryError commands are run while the
>> JVM is at a safepoint. This worked for me:
>>
>> java -XX:OnOutOfMemoryError='jstack %p' ...

Really, are you sure? I'd assumed you spawn off a separate (i.e. asynchronous)
shell process rather than waiting for it to complete while you waited
in the safepoint (i.e. synchronous). It could still be that one is
"lucky" and the shell happens to complete before the safepoint is
exited? Anyway, a good idea to check the code to see if there is
a synchronicity guarantee or one relies on plain luck to sometimes
get something useful (which itself is not bad, but good to know
when it is good fortune vs actual design :-)

-- ramki

>>
>> -John
>
> Excellent -- will try this later today.
>
> I did a quick search for places where OOME is caught and swallowed and
> found a few places within the JDK (such as direct ByteBuffer
> allocation), as well as a couple places in other libraries such as
> commons-pool and jgroups, the latter of which is used by Infinispan
> (though in some cases, but not all, those are logged before being
> swallowed). In short though, I still don't definitively know where the
> problem allocation is. So running jstack via OnOutOfMemoryError sounds
> like it is just the ticket.
>
> Cheers,
> Raman

)
By the way, it would seem that a "safepoint synchronous"
OnOutOfMemoryError execution would restrict what you could do,
just in case that caused a deadlock because the target (self)
might need to be at a non-safepoint to react to that command....

Is there such a documented restriction on what commands can
be run within OnOutOfMemeoryError (or even a flat caveat emptor)?

-- ramki

On 6/3/2011 8:26 AM, Y. Srinivas Ramakrishna wrote:
> On 6/3/2011 7:52 AM, Raman Gupta wrote:
>> On 06/03/2011 03:28 AM, John Coomes wrote:
>>> Y. Srinivas Ramakrishna () wrote:
>>>> Sorry, sent previous email without addressing all of the issues.
>>>>
>>>> On 6/2/2011 6:15 PM, Raman Gupta wrote:
>>>>> It would be *really* handy if there were a switch like:
>>>>>
>>>>> -XX:+StackTraceOnOutOfMemoryError
>>>>
>>>> Yes that would be handy, and probably not too difficult.
>>>> But I wonder also if something like OnOutOfMemoryError or
>>>> like would already get you enough info to get close to
>>>> the problem ... (although may be because it's executed in
>>>> a separate shell, by the time the command executes the
>>>> process has likely gone well past the point when the problem
>>>> occurred).
>>>
>>> No need to worry, the OnOutOfMemoryError commands are run while the
>>> JVM is at a safepoint. This worked for me:
>>>
>>> java -XX:OnOutOfMemoryError='jstack %p' ...
>
> Really, are you sure? I'd assumed you spawn off a separate (i.e. asynchronous)
> shell process rather than waiting for it to complete while you waited
> in the safepoint (i.e. synchronous). It could still be that one is
> "lucky" and the shell happens to complete before the safepoint is
> exited? Anyway, a good idea to check the code to see if there is
> a synchronicity guarantee or one relies on plain luck to sometimes
> get something useful (which itself is not bad, but good to know
> when it is good fortune vs actual design :-)
>
> -- ramki
>
>>>
>>> -John
>>
>> Excellent -- will try this later today.
>>
>> I did a quick search for places where OOME is caught and swallowed and
>> found a few places within the JDK (such as direct ByteBuffer
>> allocation), as well as a couple places in other libraries such as
>> commons-pool and jgroups, the latter of which is used by Infinispan
>> (though in some cases, but not all, those are logged before being
>> swallowed). In short though, I still don't definitively know where the
>> problem allocation is. So running jstack via OnOutOfMemoryError sounds
>> like it is just the ticket.
>>
>> Cheers,
>> Raman
>

)
John was right. I looked at the code and the process
waits for the child to complete and return: the
command is "safepoint-synchronous". May be all
the j* commands are such that they can produce their
results without needing mutator threads to run, because
i also could not find any documented restrictions on the use of
OnOutOfMemoryError.

so, yes, looks like OnOutOfMemoryError will do what you want here,
and my worries were baseless.

thanks John!
-- ramki

On 6/3/2011 8:32 AM, Y. Srinivas Ramakrishna wrote:
> By the way, it would seem that a "safepoint synchronous"
> OnOutOfMemoryError execution would restrict what you could do,
> just in case that caused a deadlock because the target (self)
> might need to be at a non-safepoint to react to that command....
>
> Is there such a documented restriction on what commands can
> be run within OnOutOfMemeoryError (or even a flat caveat emptor)?
>
> -- ramki
>
> On 6/3/2011 8:26 AM, Y. Srinivas Ramakrishna wrote:
>> On 6/3/2011 7:52 AM, Raman Gupta wrote:
>>> On 06/03/2011 03:28 AM, John Coomes wrote:
>>>> Y. Srinivas Ramakrishna () wrote:
>>>>> Sorry, sent previous email without addressing all of the issues.
>>>>>
>>>>> On 6/2/2011 6:15 PM, Raman Gupta wrote:
>>>>>> It would be *really* handy if there were a switch like:
>>>>>>
>>>>>> -XX:+StackTraceOnOutOfMemoryError
>>>>>
>>>>> Yes that would be handy, and probably not too difficult.
>>>>> But I wonder also if something like OnOutOfMemoryError or
>>>>> like would already get you enough info to get close to
>>>>> the problem ... (although may be because it's executed in
>>>>> a separate shell, by the time the command executes the
>>>>> process has likely gone well past the point when the problem
>>>>> occurred).
>>>>
>>>> No need to worry, the OnOutOfMemoryError commands are run while the
>>>> JVM is at a safepoint. This worked for me:
>>>>
>>>> java -XX:OnOutOfMemoryError='jstack %p' ...
>>
>> Really, are you sure? I'd assumed you spawn off a separate (i.e. asynchronous)
>> shell process rather than waiting for it to complete while you waited
>> in the safepoint (i.e. synchronous). It could still be that one is
>> "lucky" and the shell happens to complete before the safepoint is
>> exited? Anyway, a good idea to check the code to see if there is
>> a synchronicity guarantee or one relies on plain luck to sometimes
>> get something useful (which itself is not bad, but good to know
>> when it is good fortune vs actual design :-)
>>
>> -- ramki
>>
>>>>
>>>> -John
>>>
>>> Excellent -- will try this later today.
>>>
>>> I did a quick search for places where OOME is caught and swallowed and
>>> found a few places within the JDK (such as direct ByteBuffer
>>> allocation), as well as a couple places in other libraries such as
>>> commons-pool and jgroups, the latter of which is used by Infinispan
>>> (though in some cases, but not all, those are logged before being
>>> swallowed). In short though, I still don't definitively know where the
>>> problem allocation is. So running jstack via OnOutOfMemoryError sounds
>>> like it is just the ticket.
>>>
>>> Cheers,
>>> Raman
>>
>

)

  #20  
06-06-2011 05:15 PM
Hotspot-gc-dev member admin is online now
User
 

I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.

Linux RHEL 5.6

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

(I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).

Startup parameters:

-server
-Xms256m
-Xmx256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxPermSize=64m
-verbose:gc
-XX:-UseGCOverheadLimit
-XX:+DisableExplicitGC
-XX:+UseParallelGC
-XX:+UseFastAccessorMethods
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

The complete GC log is available here:

http://dl.dropbox.com/u/3430279/gc.log

but here is a short snippet from that log before and after the OOM:

2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
Total time for which application threads were stopped: 1.2760680 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18706.hprof ...
Application time: 0.9442610 seconds
Total time for which application threads were stopped: 2.4584870 seconds
Heap dump file created [83874513 bytes in 3.330 secs]
2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]

Note that:

1) I can reproduce this easily.

2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).

3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.

4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.

5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.

6) The heap dump in pid18706.hprof shows no objects in the finalization queue.

7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.

8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).

I'm not sure what else to try or where else to look. Any suggestions?

Cheers,
Raman Gupta
)
Raman,

The gc.log looks like it has the young collections
filtered out. Is that right? If so, please upload
the complete log.

Jon

On 05/27/11 10:57, Raman Gupta wrote:
> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>
> Linux RHEL 5.6
>
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>
> Startup parameters:
>
> -server
> -Xms256m
> -Xmx256m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:MaxPermSize=64m
> -verbose:gc
> -XX:-UseGCOverheadLimit
> -XX:+DisableExplicitGC
> -XX:+UseParallelGC
> -XX:+UseFastAccessorMethods
> -XX:AdaptiveSizePolicyOutputInterval=1
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
>
> The complete GC log is available here:
>
> http://dl.dropbox.com/u/3430279/gc.log
>
> but here is a short snippet from that log before and after the OOM:
>
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> Total time for which application threads were stopped: 1.2760680 seconds
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid18706.hprof ...
> Application time: 0.9442610 seconds
> Total time for which application threads were stopped: 2.4584870 seconds
> Heap dump file created [83874513 bytes in 3.330 secs]
> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>
> Note that:
>
> 1) I can reproduce this easily.
>
> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>
> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>
> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>
> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>
> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>
> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>
> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>
> I'm not sure what else to try or where else to look. Any suggestions?
>
> Cheers,
> Raman Gupta
)
+1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...

Regards,
Kirk

On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:

> Raman,
>
> The gc.log looks like it has the young collections
> filtered out. Is that right? If so, please upload
> the complete log.
>
> Jon
>
> On 05/27/11 10:57, Raman Gupta wrote:
>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>
>> Linux RHEL 5.6
>>
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>
>> Startup parameters:
>>
>> -server
>> -Xms256m
>> -Xmx256m
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:MaxPermSize=64m
>> -verbose:gc
>> -XX:-UseGCOverheadLimit
>> -XX:+DisableExplicitGC
>> -XX:+UseParallelGC
>> -XX:+UseFastAccessorMethods
>> -XX:AdaptiveSizePolicyOutputInterval=1
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime
>>
>> The complete GC log is available here:
>>
>> http://dl.dropbox.com/u/3430279/gc.log
>>
>> but here is a short snippet from that log before and after the OOM:
>>
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>> Total time for which application threads were stopped: 1.2760680 seconds
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid18706.hprof ...
>> Application time: 0.9442610 seconds
>> Total time for which application threads were stopped: 2.4584870 seconds
>> Heap dump file created [83874513 bytes in 3.330 secs]
>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>
>> Note that:
>>
>> 1) I can reproduce this easily.
>>
>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>
>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>
>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>
>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>
>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>
>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>
>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>
>> I'm not sure what else to try or where else to look. Any suggestions?
>>
>> Cheers,
>> Raman Gupta

)
My bad -- I filtered out the young GCs with an errant grep command...
this log should be ok:

http://dl.dropbox.com/u/3430279/gc.log

This is from a different test run than the one before, but
demonstrates the same "problem".

Cheers,
Raman

On 06/01/2011 01:35 PM, Charles K Pepperdine wrote:
> +1, can't see anything in these logs that indicates you're heading to a OOME due to Java heap. But, incomplete log...
>
> Regards,
> Kirk
>
> On Jun 1, 2011, at 7:24 PM, Jon Masamitsu wrote:
>
>> Raman,
>>
>> The gc.log looks like it has the young collections
>> filtered out. Is that right? If so, please upload
>> the complete log.
>>
>> Jon
>>
>> On 05/27/11 10:57, Raman Gupta wrote:
>>> I am getting a heap OOM for no apparent reason. Normally permanent generation size or a large finalization queue would be culprits for this sort of OOM, but AFAICT GC logs and heap dump show that neither is the case here. There is also no native code being executed.
>>>
>>> Linux RHEL 5.6
>>>
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> (I have also tried it with 1.6.0_25 64-bit and 1.6.0_25 32-bit with the same OOM result for both, but the logs below are from 1.6.0_24 64-bit).
>>>
>>> Startup parameters:
>>>
>>> -server
>>> -Xms256m
>>> -Xmx256m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:MaxPermSize=64m
>>> -verbose:gc
>>> -XX:-UseGCOverheadLimit
>>> -XX:+DisableExplicitGC
>>> -XX:+UseParallelGC
>>> -XX:+UseFastAccessorMethods
>>> -XX:AdaptiveSizePolicyOutputInterval=1
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCApplicationStoppedTime
>>>
>>> The complete GC log is available here:
>>>
>>> http://dl.dropbox.com/u/3430279/gc.log
>>>
>>> but here is a short snippet from that log before and after the OOM:
>>>
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>> Total time for which application threads were stopped: 1.2760680 seconds
>>> java.lang.OutOfMemoryError: Java heap space
>>> Dumping heap to java_pid18706.hprof ...
>>> Application time: 0.9442610 seconds
>>> Total time for which application threads were stopped: 2.4584870 seconds
>>> Heap dump file created [83874513 bytes in 3.330 secs]
>>> 2011-05-27T11:35:30.942-0400: 160.552: [Full GC [PSYoungGen: 2624K->0K(44800K)] [PSOldGen: 86916K->76124K(174784K)] 89540K->76124K(219584K) [PSPermGen: 26902K->26902K(58880K)], 0.2801510 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
>>>
>>> Note that:
>>>
>>> 1) I can reproduce this easily.
>>>
>>> 2) This seems to happen only once shortly after load is applied to the system shortly after startup -- after that everything seems fine (though I haven't yet verified this with a longer test).
>>>
>>> 3) There is *always* a Full GC showing a young generation 0K->0K collection before this happens.
>>>
>>> 4) There seems to be plenty of space in the tenured generation as well as in permanent at OOM time.
>>>
>>> 5) The heap dump in pid18706.hprof shows only 70 MB of live objects in the heap.
>>>
>>> 6) The heap dump in pid18706.hprof shows no objects in the finalization queue.
>>>
>>> 7) I note the OutOfMemoryError in the log does not have any stack trace associated with it as is normally present.
>>>
>>> 8) When running the JVM with hprof sites profiling turned on (-agentlib:hprof=heap=sites) no OOM occurs (though again I did not try a very long test).
>>>
>>> I'm not sure what else to try or where else to look. Any suggestions?
>>>
>>> Cheers,
>>> Raman Gupta
>
)
2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.65 (attempted to grow)
Tenured generation: 0.54 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 2.01 (attempted to grow)
Tenured generation: 0.54 (no change)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]

First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
for some reason that "failed" young gen collection triggers an immediate Full GC.

Bug in the collector? Did you check the bug database?

Regards,
Kirk

)
I did check the database but didn't find anything relevant. My search
terms may not be optimal, though I did scan through all the results
returned by "java.lang.OutOfMemoryError: Java heap space" as well as
"0K->0K".

I also suspected a bug in the collector and so I tried the same test
with the G1 collector, with the same OOM result. I didn't save the log
from the G1 test, but I can quite easily redo the test with any set of
JVM parameters that may be helpful in debugging -- the OOM seems to be
easily and consistently reproducible with this application.

Cheers,
Raman

On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.65 (attempted to grow)
> Tenured generation: 0.54 (attempted to grow)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> GC overhead (%)
> Young generation: 2.01 (attempted to grow)
> Tenured generation: 0.54 (no change)
> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>
> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
> for some reason that "failed" young gen collection triggers an immediate Full GC.
>
> Bug in the collector? Did you check the bug database?
>
> Regards,
> Kirk
>
)
are you trying to create a humungous object or array? Accidentally?

On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:

> I did check the database but didn't find anything relevant. My search
> terms may not be optimal, though I did scan through all the results
> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> "0K->0K".
>
> I also suspected a bug in the collector and so I tried the same test
> with the G1 collector, with the same OOM result. I didn't save the log
> from the G1 test, but I can quite easily redo the test with any set of
> JVM parameters that may be helpful in debugging -- the OOM seems to be
> easily and consistently reproducible with this application.
>
> Cheers,
> Raman
>
> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.65 (attempted to grow)
>> Tenured generation: 0.54 (attempted to grow)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>> GC overhead (%)
>> Young generation: 2.01 (attempted to grow)
>> Tenured generation: 0.54 (no change)
>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>
>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>
>> Bug in the collector? Did you check the bug database?
>>
>> Regards,
>> Kirk
>>

)
I do tend to think that somewhere a large object or array is being
created. In particular, Infinispan is one library we are using that
may be allocating large chunks of memory -- indeed, replacing
Infinispan with a local cache does seem to "fix" the problem.

However, more information from the JVM would really be useful in
isolating the offending code in Infinispan. Ideally,

a) any large allocations should show up as part of the heap dump if
the allocation succeeded but then some other subsequent code caused
the OOM, or

b) if the allocation itself failed, the OOM exception should include a
stack trace that would allow me to isolate the allocation point (as
it does normally, but for some reason in this case doesn't).

In this case the heap dump shows plenty of room in heap, and there is
no stack trace at the OOM, so I don't really have any way to isolate
the offending allocation point. In which situations does the OOM
exception get printed without an associated stack trace?

Cheers,
Raman


On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> are you trying to create a humungous object or array? Accidentally?
>
> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>
>> I did check the database but didn't find anything relevant. My search
>> terms may not be optimal, though I did scan through all the results
>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>> "0K->0K".
>>
>> I also suspected a bug in the collector and so I tried the same test
>> with the G1 collector, with the same OOM result. I didn't save the log
>> from the G1 test, but I can quite easily redo the test with any set of
>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>> easily and consistently reproducible with this application.
>>
>> Cheers,
>> Raman
>>
>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.65 (attempted to grow)
>>> Tenured generation: 0.54 (attempted to grow)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>> GC overhead (%)
>>> Young generation: 2.01 (attempted to grow)
>>> Tenured generation: 0.54 (no change)
>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>
>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>
>>> Bug in the collector? Did you check the bug database?
>>>
>>> Regards,
>>> Kirk
>>>
>
)
If your code is not catching the OOM exception you'd
expect to see the stack retrace when the program dies.
If it catches the exception and carries on, you'd want
it to print the exception detail. I don't know of
cases where the exception would just disappear.

In your case the report to stdout/stderr(?)that an OOM occurred and that
the heap is being dumped comes from inside the JVM
because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
After this point, your allocating thread would have gotten
an OOME which it probably caught and swallowed, and hence
the silence wrt the stack retrace you would normally see. You
will want to look at your Infinispan code to see how
it deals with the inability to allocate said large objects.

Recall that object size is limited by the size of and
available space in the largest area (Eden or Old) in your
Java heap. As Kirk noted, the full gc was to attempt allocation
of an object that didn't fit into the available space in
Eden or in Old (so from that you can estimate the size of
the request).

Note also that the JDK libraries will resize hashtables under
you and that can also cause large allocation requests
(but i don't know how they handle OOM's resulting from such
allocations).

-- ramki

On 06/02/11 09:56, Raman Gupta wrote:
> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
)
Well, GC is some what orthogonal to what you're application is up to except for this special case. I've cc'ed Manik in on this one maybe he's had someone run into it before.

Regards,
Kirk

On Jun 2, 2011, at 6:56 PM, Raman Gupta wrote:

> I do tend to think that somewhere a large object or array is being
> created. In particular, Infinispan is one library we are using that
> may be allocating large chunks of memory -- indeed, replacing
> Infinispan with a local cache does seem to "fix" the problem.
>
> However, more information from the JVM would really be useful in
> isolating the offending code in Infinispan. Ideally,
>
> a) any large allocations should show up as part of the heap dump if
> the allocation succeeded but then some other subsequent code caused
> the OOM, or
>
> b) if the allocation itself failed, the OOM exception should include a
> stack trace that would allow me to isolate the allocation point (as
> it does normally, but for some reason in this case doesn't).
>
> In this case the heap dump shows plenty of room in heap, and there is
> no stack trace at the OOM, so I don't really have any way to isolate
> the offending allocation point. In which situations does the OOM
> exception get printed without an associated stack trace?
>
> Cheers,
> Raman
>
>
> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>> are you trying to create a humungous object or array? Accidentally?
>>
>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>
>>> I did check the database but didn't find anything relevant. My search
>>> terms may not be optimal, though I did scan through all the results
>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>> "0K->0K".
>>>
>>> I also suspected a bug in the collector and so I tried the same test
>>> with the G1 collector, with the same OOM result. I didn't save the log
>>> from the G1 test, but I can quite easily redo the test with any set of
>>> JVM parameters that may be helpful in debugging -- the OOM seems to be
>>> easily and consistently reproducible with this application.
>>>
>>> Cheers,
>>> Raman
>>>
>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen: 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)] 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)], 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.65 (attempted to grow)
>>>> Tenured generation: 0.54 (attempted to grow)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen: 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>> GC overhead (%)
>>>> Young generation: 2.01 (attempted to grow)
>>>> Tenured generation: 0.54 (no change)
>>>> Tenuring threshold: (attempted to decrease to balance GC costs) = 1
>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen: 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)] 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)], 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>
>>>> First Full GC looks normal as PSOldGen is (174784-169093)K = 5691K from being full. Nothing happening in Perm.
>>>> The second is where things start to get weird. I don't see why that GC was called. Stranger still, it's ~800ms *after* the full gc and yet no application thread allocated any memory out of young gen.
>>>> for some reason that "failed" young gen collection triggers an immediate Full GC.
>>>>
>>>> Bug in the collector? Did you check the bug database?
>>>>
>>>> Regards,
>>>> Kirk
>>>>
>>

)
It would be *really* handy if there were a switch like:

-XX:+StackTraceOnOutOfMemoryError

to force the stack trace to be shown. Obviously looking at every line
of code of every library my application uses, including core JDK
libraries, for code paths where large amounts of heap may be allocated
and the associated OOME is caught and swallowed, is pretty much
impossible.

I think my next step is to increase the max heap size to a large value
which hopefully allows the large allocation to occur without failure,
and then periodically take heap dumps to isolate it.

Thanks,
Raman

On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> If your code is not catching the OOM exception you'd
> expect to see the stack retrace when the program dies.
> If it catches the exception and carries on, you'd want
> it to print the exception detail. I don't know of
> cases where the exception would just disappear.
>
> In your case the report to stdout/stderr(?)that an OOM occurred and that
> the heap is being dumped comes from inside the JVM
> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> After this point, your allocating thread would have gotten
> an OOME which it probably caught and swallowed, and hence
> the silence wrt the stack retrace you would normally see. You
> will want to look at your Infinispan code to see how
> it deals with the inability to allocate said large objects.
>
> Recall that object size is limited by the size of and
> available space in the largest area (Eden or Old) in your
> Java heap. As Kirk noted, the full gc was to attempt allocation
> of an object that didn't fit into the available space in
> Eden or in Old (so from that you can estimate the size of
> the request).
>
> Note also that the JDK libraries will resize hashtables under
> you and that can also cause large allocation requests
> (but i don't know how they handle OOM's resulting from such
> allocations).
>
> -- ramki
>
> On 06/02/11 09:56, Raman Gupta wrote:
>> I do tend to think that somewhere a large object or array is being
>> created. In particular, Infinispan is one library we are using that
>> may be allocating large chunks of memory -- indeed, replacing
>> Infinispan with a local cache does seem to "fix" the problem.
>>
>> However, more information from the JVM would really be useful in
>> isolating the offending code in Infinispan. Ideally,
>>
>> a) any large allocations should show up as part of the heap dump if
>> the allocation succeeded but then some other subsequent code caused
>> the OOM, or
>>
>> b) if the allocation itself failed, the OOM exception should include a
>> stack trace that would allow me to isolate the allocation point (as
>> it does normally, but for some reason in this case doesn't).
>>
>> In this case the heap dump shows plenty of room in heap, and there is
>> no stack trace at the OOM, so I don't really have any way to isolate
>> the offending allocation point. In which situations does the OOM
>> exception get printed without an associated stack trace?
>>
>> Cheers,
>> Raman
>>
>>
>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>> are you trying to create a humungous object or array? Accidentally?
>>>
>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>
>>>> I did check the database but didn't find anything relevant. My search
>>>> terms may not be optimal, though I did scan through all the results
>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>> "0K->0K".
>>>>
>>>> I also suspected a bug in the collector and so I tried the same test
>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>> log
>>>> from the G1 test, but I can quite easily redo the test with any
>>>> set of
>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>> to be
>>>> easily and consistently reproducible with this application.
>>>>
>>>> Cheers,
>>>> Raman
>>>>
>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.65 (attempted to grow)
>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>> actions to meet *** throughput goal ***
>>>>> GC overhead (%)
>>>>> Young generation: 2.01 (attempted to grow)
>>>>> Tenured generation: 0.54 (no change)
>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>> 5691K from being full. Nothing happening in Perm.
>>>>> The second is where things start to get weird. I don't see why
>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>> gc and yet no application thread allocated any memory out of
>>>>> young gen.
>>>>> for some reason that "failed" young gen collection triggers an
>>>>> immediate Full GC.
>>>>>
>>>>> Bug in the collector? Did you check the bug database?
>>>>>
>>>>> Regards,
>>>>> Kirk
>>>>>
)
You "just" have to find all the places where OOME is caught.
Hopefully there aren't too many of those?

-- ramki

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
Sorry, sent previous email without addressing all of the issues.

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError

Yes that would be handy, and probably not too difficult.
But I wonder also if something like OnOutOfMemoryError or
like would already get you enough info to get close to
the problem ... (although may be because it's executed in
a separate shell, by the time the command executes the
process has likely gone well past the point when the problem
occurred).

>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.

Try to look for places where OOME (or supertype?) is caught. I am
hoping there aren't too many of those...

>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.

Yes that seems reasonable, or may be use an allocation profiler
with the larger heap and find it that way...

-- ramki

>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
you should get a stack trace, something must be eating it.

Regards,
Kirk

On Jun 3, 2011, at 3:15 AM, Raman Gupta wrote:

> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError
>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.
>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.
>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.65 (attempted to grow)
>>>>>> Tenured generation: 0.54 (attempted to grow)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
>>>>>> actions to meet *** throughput goal ***
>>>>>> GC overhead (%)
>>>>>> Young generation: 2.01 (attempted to grow)
>>>>>> Tenured generation: 0.54 (no change)
>>>>>> Tenuring threshold: (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>

)
Y. Srinivas Ramakrishna () wrote:
> Sorry, sent previous email without addressing all of the issues.
>
> On 6/2/2011 6:15 PM, Raman Gupta wrote:
> > It would be *really* handy if there were a switch like:
> >
> > -XX:+StackTraceOnOutOfMemoryError
>
> Yes that would be handy, and probably not too difficult.
> But I wonder also if something like OnOutOfMemoryError or
> like would already get you enough info to get close to
> the problem ... (although may be because it's executed in
> a separate shell, by the time the command executes the
> process has likely gone well past the point when the problem
> occurred).

No need to worry, the OnOutOfMemoryError commands are run while the
JVM is at a safepoint. This worked for me:

java -XX:OnOutOfMemoryError='jstack %p' ...

-John

> > to force the stack trace to be shown. Obviously looking at every line
> > of code of every library my application uses, including core JDK
> > libraries, for code paths where large amounts of heap may be allocated
> > and the associated OOME is caught and swallowed, is pretty much
> > impossible.
>
> Try to look for places where OOME (or supertype?) is caught. I am
> hoping there aren't too many of those...
>
> >
> > I think my next step is to increase the max heap size to a large value
> > which hopefully allows the large allocation to occur without failure,
> > and then periodically take heap dumps to isolate it.
>
> Yes that seems reasonable, or may be use an allocation profiler
> with the larger heap and find it that way...
>
> -- ramki
>
> >
> > Thanks,
> > Raman
> >
> > On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> >> If your code is not catching the OOM exception you'd
> >> expect to see the stack retrace when the program dies.
> >> If it catches the exception and carries on, you'd want
> >> it to print the exception detail. I don't know of
> >> cases where the exception would just disappear.
> >>
> >> In your case the report to stdout/stderr(?)that an OOM occurred and that
> >> the heap is being dumped comes from inside the JVM
> >> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> >> After this point, your allocating thread would have gotten
> >> an OOME which it probably caught and swallowed, and hence
> >> the silence wrt the stack retrace you would normally see. You
> >> will want to look at your Infinispan code to see how
> >> it deals with the inability to allocate said large objects.
> >>
> >> Recall that object size is limited by the size of and
> >> available space in the largest area (Eden or Old) in your
> >> Java heap. As Kirk noted, the full gc was to attempt allocation
> >> of an object that didn't fit into the available space in
> >> Eden or in Old (so from that you can estimate the size of
> >> the request).
> >>
> >> Note also that the JDK libraries will resize hashtables under
> >> you and that can also cause large allocation requests
> >> (but i don't know how they handle OOM's resulting from such
> >> allocations).
> >>
> >> -- ramki
> >>
> >> On 06/02/11 09:56, Raman Gupta wrote:
> >>> I do tend to think that somewhere a large object or array is being
> >>> created. In particular, Infinispan is one library we are using that
> >>> may be allocating large chunks of memory -- indeed, replacing
> >>> Infinispan with a local cache does seem to "fix" the problem.
> >>>
> >>> However, more information from the JVM would really be useful in
> >>> isolating the offending code in Infinispan. Ideally,
> >>>
> >>> a) any large allocations should show up as part of the heap dump if
> >>> the allocation succeeded but then some other subsequent code caused
> >>> the OOM, or
> >>>
> >>> b) if the allocation itself failed, the OOM exception should include a
> >>> stack trace that would allow me to isolate the allocation point (as
> >>> it does normally, but for some reason in this case doesn't).
> >>>
> >>> In this case the heap dump shows plenty of room in heap, and there is
> >>> no stack trace at the OOM, so I don't really have any way to isolate
> >>> the offending allocation point. In which situations does the OOM
> >>> exception get printed without an associated stack trace?
> >>>
> >>> Cheers,
> >>> Raman
> >>>
> >>>
> >>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> >>>> are you trying to create a humungous object or array? Accidentally?
> >>>>
> >>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
> >>>>
> >>>>> I did check the database but didn't find anything relevant. My search
> >>>>> terms may not be optimal, though I did scan through all the results
> >>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> >>>>> "0K->0K".
> >>>>>
> >>>>> I also suspected a bug in the collector and so I tried the same test
> >>>>> with the G1 collector, with the same OOM result. I didn't save the
> >>>>> log
> >>>>> from the G1 test, but I can quite easily redo the test with any
> >>>>> set of
> >>>>> JVM parameters that may be helpful in debugging -- the OOM seems
> >>>>> to be
> >>>>> easily and consistently reproducible with this application.
> >>>>>
> >>>>> Cheers,
> >>>>> Raman
> >>>>>
> >>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> >>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
> >>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
> >>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
> >>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> >>>>>> UseAdaptiveSizePolicy actions to meet *** throughput goal ***
> >>>>>> GC overhead (%)
> >>>>>> Young generation: 2.65 (attempted to grow)
> >>>>>> Tenured generation: 0.54 (attempted to grow)
> >>>>>> Tenuring threshold: (attempted to decrease to balance GC
> >>>>>> costs) = 1
> >>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
> >>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
> >>>>>> user=0.00 sys=0.00, real=0.00 secs] UseAdaptiveSizePolicy
> >>>>>> actions to meet *** throughput goal ***
> >>>>>> GC overhead (%)
> >>>>>> Young generation: 2.01 (attempted to grow)
> >>>>>> Tenured generation: 0.54 (no change)
> >>>>>> Tenuring threshold: (attempted to decrease to balance GC
> >>>>>> costs) = 1
> >>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
> >>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
> >>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
> >>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> >>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
> >>>>>> 5691K from being full. Nothing happening in Perm.
> >>>>>> The second is where things start to get weird. I don't see why
> >>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
> >>>>>> gc and yet no application thread allocated any memory out of
> >>>>>> young gen.
> >>>>>> for some reason that "failed" young gen collection triggers an
> >>>>>> immediate Full GC.
> >>>>>>
> >>>>>> Bug in the collector? Did you check the bug database?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Kirk
> >>>>>>
>
)
On 06/03/2011 03:28 AM, John Coomes wrote:
> Y. Srinivas Ramakrishna () wrote:
>> Sorry, sent previous email without addressing all of the issues.
>>
>> On 6/2/2011 6:15 PM, Raman Gupta wrote:
>>> It would be *really* handy if there were a switch like:
>>>
>>> -XX:+StackTraceOnOutOfMemoryError
>>
>> Yes that would be handy, and probably not too difficult.
>> But I wonder also if something like OnOutOfMemoryError or
>> like would already get you enough info to get close to
>> the problem ... (although may be because it's executed in
>> a separate shell, by the time the command executes the
>> process has likely gone well past the point when the problem
>> occurred).
>
> No need to worry, the OnOutOfMemoryError commands are run while the
> JVM is at a safepoint. This worked for me:
>
> java -XX:OnOutOfMemoryError='jstack %p' ...
>
> -John

Excellent -- will try this later today.

I did a quick search for places where OOME is caught and swallowed and
found a few places within the JDK (such as direct ByteBuffer
allocation), as well as a couple places in other libraries such as
commons-pool and jgroups, the latter of which is used by Infinispan
(though in some cases, but not all, those are logged before being
swallowed). In short though, I still don't definitively know where the
problem allocation is. So running jstack via OnOutOfMemoryError sounds
like it is just the ticket.

Cheers,
Raman
)
On 6/3/2011 7:52 AM, Raman Gupta wrote:
> On 06/03/2011 03:28 AM, John Coomes wrote:
>> Y. Srinivas Ramakrishna () wrote:
>>> Sorry, sent previous email without addressing all of the issues.
>>>
>>> On 6/2/2011 6:15 PM, Raman Gupta wrote:
>>>> It would be *really* handy if there were a switch like:
>>>>
>>>> -XX:+StackTraceOnOutOfMemoryError
>>>
>>> Yes that would be handy, and probably not too difficult.
>>> But I wonder also if something like OnOutOfMemoryError or
>>> like would already get you enough info to get close to
>>> the problem ... (although may be because it's executed in
>>> a separate shell, by the time the command executes the
>>> process has likely gone well past the point when the problem
>>> occurred).
>>
>> No need to worry, the OnOutOfMemoryError commands are run while the
>> JVM is at a safepoint. This worked for me:
>>
>> java -XX:OnOutOfMemoryError='jstack %p' ...

Really, are you sure? I'd assumed you spawn off a separate (i.e. asynchronous)
shell process rather than waiting for it to complete while you waited
in the safepoint (i.e. synchronous). It could still be that one is
"lucky" and the shell happens to complete before the safepoint is
exited? Anyway, a good idea to check the code to see if there is
a synchronicity guarantee or one relies on plain luck to sometimes
get something useful (which itself is not bad, but good to know
when it is good fortune vs actual design :-)

-- ramki

>>
>> -John
>
> Excellent -- will try this later today.
>
> I did a quick search for places where OOME is caught and swallowed and
> found a few places within the JDK (such as direct ByteBuffer
> allocation), as well as a couple places in other libraries such as
> commons-pool and jgroups, the latter of which is used by Infinispan
> (though in some cases, but not all, those are logged before being
> swallowed). In short though, I still don't definitively know where the
> problem allocation is. So running jstack via OnOutOfMemoryError sounds
> like it is just the ticket.
>
> Cheers,
> Raman

)
By the way, it would seem that a "safepoint synchronous"
OnOutOfMemoryError execution would restrict what you could do,
just in case that caused a deadlock because the target (self)
might need to be at a non-safepoint to react to that command....

Is there such a documented restriction on what commands can
be run within OnOutOfMemeoryError (or even a flat caveat emptor)?

-- ramki

On 6/3/2011 8:26 AM, Y. Srinivas Ramakrishna wrote:
> On 6/3/2011 7:52 AM, Raman Gupta wrote:
>> On 06/03/2011 03:28 AM, John Coomes wrote:
>>> Y. Srinivas Ramakrishna () wrote:
>>>> Sorry, sent previous email without addressing all of the issues.
>>>>
>>>> On 6/2/2011 6:15 PM, Raman Gupta wrote:
>>>>> It would be *really* handy if there were a switch like:
>>>>>
>>>>> -XX:+StackTraceOnOutOfMemoryError
>>>>
>>>> Yes that would be handy, and probably not too difficult.
>>>> But I wonder also if something like OnOutOfMemoryError or
>>>> like would already get you enough info to get close to
>>>> the problem ... (although may be because it's executed in
>>>> a separate shell, by the time the command executes the
>>>> process has likely gone well past the point when the problem
>>>> occurred).
>>>
>>> No need to worry, the OnOutOfMemoryError commands are run while the
>>> JVM is at a safepoint. This worked for me:
>>>
>>> java -XX:OnOutOfMemoryError='jstack %p' ...
>
> Really, are you sure? I'd assumed you spawn off a separate (i.e. asynchronous)
> shell process rather than waiting for it to complete while you waited
> in the safepoint (i.e. synchronous). It could still be that one is
> "lucky" and the shell happens to complete before the safepoint is
> exited? Anyway, a good idea to check the code to see if there is
> a synchronicity guarantee or one relies on plain luck to sometimes
> get something useful (which itself is not bad, but good to know
> when it is good fortune vs actual design :-)
>
> -- ramki
>
>>>
>>> -John
>>
>> Excellent -- will try this later today.
>>
>> I did a quick search for places where OOME is caught and swallowed and
>> found a few places within the JDK (such as direct ByteBuffer
>> allocation), as well as a couple places in other libraries such as
>> commons-pool and jgroups, the latter of which is used by Infinispan
>> (though in some cases, but not all, those are logged before being
>> swallowed). In short though, I still don't definitively know where the
>> problem allocation is. So running jstack via OnOutOfMemoryError sounds
>> like it is just the ticket.
>>
>> Cheers,
>> Raman
>

)
John was right. I looked at the code and the process
waits for the child to complete and return: the
command is "safepoint-synchronous". May be all
the j* commands are such that they can produce their
results without needing mutator threads to run, because
i also could not find any documented restrictions on the use of
OnOutOfMemoryError.

so, yes, looks like OnOutOfMemoryError will do what you want here,
and my worries were baseless.

thanks John!
-- ramki

On 6/3/2011 8:32 AM, Y. Srinivas Ramakrishna wrote:
> By the way, it would seem that a "safepoint synchronous"
> OnOutOfMemoryError execution would restrict what you could do,
> just in case that caused a deadlock because the target (self)
> might need to be at a non-safepoint to react to that command....
>
> Is there such a documented restriction on what commands can
> be run within OnOutOfMemeoryError (or even a flat caveat emptor)?
>
> -- ramki
>
> On 6/3/2011 8:26 AM, Y. Srinivas Ramakrishna wrote:
>> On 6/3/2011 7:52 AM, Raman Gupta wrote:
>>> On 06/03/2011 03:28 AM, John Coomes wrote:
>>>> Y. Srinivas Ramakrishna () wrote:
>>>>> Sorry, sent previous email without addressing all of the issues.
>>>>>
>>>>> On 6/2/2011 6:15 PM, Raman Gupta wrote:
>>>>>> It would be *really* handy if there were a switch like:
>>>>>>
>>>>>> -XX:+StackTraceOnOutOfMemoryError
>>>>>
>>>>> Yes that would be handy, and probably not too difficult.
>>>>> But I wonder also if something like OnOutOfMemoryError or
>>>>> like would already get you enough info to get close to
>>>>> the problem ... (although may be because it's executed in
>>>>> a separate shell, by the time the command executes the
>>>>> process has likely gone well past the point when the problem
>>>>> occurred).
>>>>
>>>> No need to worry, the OnOutOfMemoryError commands are run while the
>>>> JVM is at a safepoint. This worked for me:
>>>>
>>>> java -XX:OnOutOfMemoryError='jstack %p' ...
>>
>> Really, are you sure? I'd assumed you spawn off a separate (i.e. asynchronous)
>> shell process rather than waiting for it to complete while you waited
>> in the safepoint (i.e. synchronous). It could still be that one is
>> "lucky" and the shell happens to complete before the safepoint is
>> exited? Anyway, a good idea to check the code to see if there is
>> a synchronicity guarantee or one relies on plain luck to sometimes
>> get something useful (which itself is not bad, but good to know
>> when it is good fortune vs actual design :-)
>>
>> -- ramki
>>
>>>>
>>>> -John
>>>
>>> Excellent -- will try this later today.
>>>
>>> I did a quick search for places where OOME is caught and swallowed and
>>> found a few places within the JDK (such as direct ByteBuffer
>>> allocation), as well as a couple places in other libraries such as
>>> commons-pool and jgroups, the latter of which is used by Infinispan
>>> (though in some cases, but not all, those are logged before being
>>> swallowed). In short though, I still don't definitively know where the
>>> problem allocation is. So running jstack via OnOutOfMemoryError sounds
>>> like it is just the ticket.
>>>
>>> Cheers,
>>> Raman
>>
>

)
To close this off from the perspective of the gc-dev list... using
jstack in combination with the OnOutOfMemoryError parameter was indeed
extremely useful in determining the allocation point causing the OOME.

The allocation point was an Object deserialization triggered by
Infinispan's unmarshalling code, and the OOME was swallowed by
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle()
by catching Throwable -- which only logs the Throwable if log.trace is
enabled (yuck).

In addition, at the point of the allocation failure the unmarshaller
is trying to allocate a very large array of chars which *should* be no
longer than 6, so basically it looks like there is likely a bug
somewhere in the application, in JGroups, or in Infinispan.

Thanks a lot to all of you for your help, as usual.

Cheers,
Raman Gupta

On 06/03/2011 11:50 AM, Y. Srinivas Ramakrishna wrote:
> John was right. I looked at the code and the process
> waits for the child to complete and return: the
> command is "safepoint-synchronous". May be all
> the j* commands are such that they can produce their
> results without needing mutator threads to run, because
> i also could not find any documented restrictions on the use of
> OnOutOfMemoryError.
>
> so, yes, looks like OnOutOfMemoryError will do what you want here,
> and my worries were baseless.
>
> thanks John!
> -- ramki
>
> On 6/3/2011 8:32 AM, Y. Srinivas Ramakrishna wrote:
>> By the way, it would seem that a "safepoint synchronous"
>> OnOutOfMemoryError execution would restrict what you could do,
>> just in case that caused a deadlock because the target (self)
>> might need to be at a non-safepoint to react to that command....
>>
>> Is there such a documented restriction on what commands can
>> be run within OnOutOfMemeoryError (or even a flat caveat emptor)?
>>
>> -- ramki
>>
>> On 6/3/2011 8:26 AM, Y. Srinivas Ramakrishna wrote:
>>> On 6/3/2011 7:52 AM, Raman Gupta wrote:
>>>> On 06/03/2011 03:28 AM, John Coomes wrote:
>>>>> Y. Srinivas Ramakrishna () wrote:
>>>>>> Sorry, sent previous email without addressing all of the issues.
>>>>>>
>>>>>> On 6/2/2011 6:15 PM, Raman Gupta wrote:
>>>>>>> It would be *really* handy if there were a switch like:
>>>>>>>
>>>>>>> -XX:+StackTraceOnOutOfMemoryError
>>>>>>
>>>>>> Yes that would be handy, and probably not too difficult.
>>>>>> But I wonder also if something like OnOutOfMemoryError or
>>>>>> like would already get you enough info to get close to
>>>>>> the problem ... (although may be because it's executed in
>>>>>> a separate shell, by the time the command executes the
>>>>>> process has likely gone well past the point when the problem
>>>>>> occurred).
>>>>>
>>>>> No need to worry, the OnOutOfMemoryError commands are run while the
>>>>> JVM is at a safepoint. This worked for me:
>>>>>
>>>>> java -XX:OnOutOfMemoryError='jstack %p' ...
>>>
>>> Really, are you sure? I'd assumed you spawn off a separate (i.e.
>>> asynchronous)
>>> shell process rather than waiting for it to complete while you waited
>>> in the safepoint (i.e. synchronous). It could still be that one is
>>> "lucky" and the shell happens to complete before the safepoint is
>>> exited? Anyway, a good idea to check the code to see if there is
>>> a synchronicity guarantee or one relies on plain luck to sometimes
>>> get something useful (which itself is not bad, but good to know
>>> when it is good fortune vs actual design :-)
>>>
>>> -- ramki
>>>
>>>>>
>>>>> -John
>>>>
>>>> Excellent -- will try this later today.
>>>>
>>>> I did a quick search for places where OOME is caught and swallowed
>>>> and
>>>> found a few places within the JDK (such as direct ByteBuffer
>>>> allocation), as well as a couple places in other libraries such as
>>>> commons-pool and jgroups, the latter of which is used by Infinispan
>>>> (though in some cases, but not all, those are logged before being
>>>> swallowed). In short though, I still don't definitively know where
>>>> the
>>>> problem allocation is. So running jstack via OnOutOfMemoryError
>>>> sounds
>>>> like it is just the ticket.
>>>>
>>>> Cheers,
>>>> Raman
>>>
>>
>
)





NewsArc Lists  |  Culture Pages   |  Computing Archive  |  Media-Pages
Link to this page on your blog or website by copying the HTML code below and pasting it into your site: