Freebsd-stable Archive

List Statistics

  • Total Threads: 2123
  • Total Posts: 4994
  #1  
29-05-2012 08:26 PM
Freebsd-stable member admin is online now
User
 

Dear All,

I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.

In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.

To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.

Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?

I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?

All file systems are mounted "noatime, soft-updates".

Details:

# uname -a
FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
# gstat -f 'ada[0-3]$' -b
dT: 1.001s w: 1.000s filter: ada[0-3]$
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0 ada0
0 0 0 0 0.0 0 0 0.0 0.0 ada1
0 0 0 0 0.0 0 0 0.0 0.0 ada2
103 273 0 0 0.0 273 34630 2062 121.9 ada3
# camcontrol devlist
at scbus1 target 0 lun 0 (pass0,ada0)
at scbus2 target 0 lun 0 (pass1,ada1)
at scbus3 target 0 lun 0 (pass2,ada2)
at scbus4 target 0 lun 0 (pass3,ada3)
at scbus7 target 0 lun 0 (pass4,cd0)
at scbus8 target 0 lun 0 (pass5,cd1)
# _
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)

  #2  
29-05-2012 08:32 PM
Freebsd-stable member admin is online now
User
 

Dear All,

I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.

In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.

To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.

Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?

I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?

All file systems are mounted "noatime, soft-updates".

Details:

# uname -a
FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
# gstat -f 'ada[0-3]$' -b
dT: 1.001s w: 1.000s filter: ada[0-3]$
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0 ada0
0 0 0 0 0.0 0 0 0.0 0.0 ada1
0 0 0 0 0.0 0 0 0.0 0.0 ada2
103 273 0 0 0.0 273 34630 2062 121.9 ada3
# camcontrol devlist
at scbus1 target 0 lun 0 (pass0,ada0)
at scbus2 target 0 lun 0 (pass1,ada1)
at scbus3 target 0 lun 0 (pass2,ada2)
at scbus4 target 0 lun 0 (pass3,ada3)
at scbus7 target 0 lun 0 (pass4,cd0)
at scbus8 target 0 lun 0 (pass5,cd1)
# _
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On 5/29/2012 12:26 PM, Kees Jan Koster wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine.

Assuming you're using the default scheduler (SCHED_ULE), try switching
to the 4BSD scheduler in your kernel config file and see if that helps.

Doug

--

This .signature sanitized for your protection
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)

  #3  
29-05-2012 08:34 PM
Freebsd-stable member admin is online now
User
 

Dear All,

I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.

In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.

To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.

Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?

I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?

All file systems are mounted "noatime, soft-updates".

Details:

# uname -a
FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
# gstat -f 'ada[0-3]$' -b
dT: 1.001s w: 1.000s filter: ada[0-3]$
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0 ada0
0 0 0 0 0.0 0 0 0.0 0.0 ada1
0 0 0 0 0.0 0 0 0.0 0.0 ada2
103 273 0 0 0.0 273 34630 2062 121.9 ada3
# camcontrol devlist
at scbus1 target 0 lun 0 (pass0,ada0)
at scbus2 target 0 lun 0 (pass1,ada1)
at scbus3 target 0 lun 0 (pass2,ada2)
at scbus4 target 0 lun 0 (pass3,ada3)
at scbus7 target 0 lun 0 (pass4,cd0)
at scbus8 target 0 lun 0 (pass5,cd1)
# _
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On 5/29/2012 12:26 PM, Kees Jan Koster wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine.

Assuming you're using the default scheduler (SCHED_ULE), try switching
to the 4BSD scheduler in your kernel config file and see if that helps.

Doug

--

This .signature sanitized for your protection
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".

You may want to play around with gshed, the GEOM Scheduler.

Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
order to satisfy write threads (or the other way around?). But,
adding gsched into the mix helped things immensely, allowing mixed
reads/writes to better shares disk I/O resources.

I'll see if I can dig up a link to his testing e-mail messages.

--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)

  #4  
29-05-2012 08:40 PM
Freebsd-stable member admin is online now
User
 

Dear All,

I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.

In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.

To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.

Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?

I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?

All file systems are mounted "noatime, soft-updates".

Details:

# uname -a
FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
# gstat -f 'ada[0-3]$' -b
dT: 1.001s w: 1.000s filter: ada[0-3]$
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0 ada0
0 0 0 0 0.0 0 0 0.0 0.0 ada1
0 0 0 0 0.0 0 0 0.0 0.0 ada2
103 273 0 0 0.0 273 34630 2062 121.9 ada3
# camcontrol devlist
at scbus1 target 0 lun 0 (pass0,ada0)
at scbus2 target 0 lun 0 (pass1,ada1)
at scbus3 target 0 lun 0 (pass2,ada2)
at scbus4 target 0 lun 0 (pass3,ada3)
at scbus7 target 0 lun 0 (pass4,cd0)
at scbus8 target 0 lun 0 (pass5,cd1)
# _
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On 5/29/2012 12:26 PM, Kees Jan Koster wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine.

Assuming you're using the default scheduler (SCHED_ULE), try switching
to the 4BSD scheduler in your kernel config file and see if that helps.

Doug

--

This .signature sanitized for your protection
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".

You may want to play around with gshed, the GEOM Scheduler.

Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
order to satisfy write threads (or the other way around?). But,
adding gsched into the mix helped things immensely, allowing mixed
reads/writes to better shares disk I/O resources.

I'll see if I can dig up a link to his testing e-mail messages.

--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:34 PM, Freddie Cash <> wrote:
> On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
>> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>>
>> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>>
>> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>>
>> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>>
>> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>>
>> All file systems are mounted "noatime, soft-updates".
>
> You may want to play around with gshed, the GEOM Scheduler.
>
> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
> order to satisfy write threads (or the other way around?).  But,
> adding gsched into the mix helped things immensely, allowing mixed
> reads/writes to better shares disk I/O resources.
>
> I'll see if I can dig up a link to his testing e-mail messages.

Here's the post, part of a thread on benchmarking RAID controllers:

http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)

  #5  
29-05-2012 09:39 PM
Freebsd-stable member admin is online now
User
 

Dear All,

I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.

In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.

To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.

Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?

I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?

All file systems are mounted "noatime, soft-updates".

Details:

# uname -a
FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
# gstat -f 'ada[0-3]$' -b
dT: 1.001s w: 1.000s filter: ada[0-3]$
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0 ada0
0 0 0 0 0.0 0 0 0.0 0.0 ada1
0 0 0 0 0.0 0 0 0.0 0.0 ada2
103 273 0 0 0.0 273 34630 2062 121.9 ada3
# camcontrol devlist
at scbus1 target 0 lun 0 (pass0,ada0)
at scbus2 target 0 lun 0 (pass1,ada1)
at scbus3 target 0 lun 0 (pass2,ada2)
at scbus4 target 0 lun 0 (pass3,ada3)
at scbus7 target 0 lun 0 (pass4,cd0)
at scbus8 target 0 lun 0 (pass5,cd1)
# _
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On 5/29/2012 12:26 PM, Kees Jan Koster wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine.

Assuming you're using the default scheduler (SCHED_ULE), try switching
to the 4BSD scheduler in your kernel config file and see if that helps.

Doug

--

This .signature sanitized for your protection
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".

You may want to play around with gshed, the GEOM Scheduler.

Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
order to satisfy write threads (or the other way around?). But,
adding gsched into the mix helped things immensely, allowing mixed
reads/writes to better shares disk I/O resources.

I'll see if I can dig up a link to his testing e-mail messages.

--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:34 PM, Freddie Cash <> wrote:
> On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
>> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>>
>> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>>
>> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>>
>> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>>
>> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>>
>> All file systems are mounted "noatime, soft-updates".
>
> You may want to play around with gshed, the GEOM Scheduler.
>
> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
> order to satisfy write threads (or the other way around?).  But,
> adding gsched into the mix helped things immensely, allowing mixed
> reads/writes to better shares disk I/O resources.
>
> I'll see if I can dig up a link to his testing e-mail messages.

Here's the post, part of a thread on benchmarking RAID controllers:

http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 09:26:32PM +0200, Kees Jan Koster wrote:
> Dear All,
>
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".
>
> Details:
>
> # uname -a
> FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
> # gstat -f 'ada[0-3]$' -b
> dT: 1.001s w: 1.000s filter: ada[0-3]$
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> 0 0 0 0 0.0 0 0 0.0 0.0 ada0
> 0 0 0 0 0.0 0 0 0.0 0.0 ada1
> 0 0 0 0 0.0 0 0 0.0 0.0 ada2
> 103 273 0 0 0.0 273 34630 2062 121.9 ada3
> # camcontrol devlist
> at scbus1 target 0 lun 0 (pass0,ada0)
> at scbus2 target 0 lun 0 (pass1,ada1)
> at scbus3 target 0 lun 0 (pass2,ada2)
> at scbus4 target 0 lun 0 (pass3,ada3)
> at scbus7 target 0 lun 0 (pass4,cd0)
> at scbus8 target 0 lun 0 (pass5,cd1)


Check the SSD for its internal block size and make sure your filesystem
and partitions are aligned with the disk block size. Unless there
is something wrong with your SATA controller I'd expect a lot more than
273 IOPS/sec and ~30MByte/sec from a SSD.

Regards,

Gary
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)

  #6  
29-05-2012 09:54 PM
Freebsd-stable member admin is online now
User
 

Dear All,

I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.

In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.

To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.

Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?

I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?

All file systems are mounted "noatime, soft-updates".

Details:

# uname -a
FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
# gstat -f 'ada[0-3]$' -b
dT: 1.001s w: 1.000s filter: ada[0-3]$
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0 ada0
0 0 0 0 0.0 0 0 0.0 0.0 ada1
0 0 0 0 0.0 0 0 0.0 0.0 ada2
103 273 0 0 0.0 273 34630 2062 121.9 ada3
# camcontrol devlist
at scbus1 target 0 lun 0 (pass0,ada0)
at scbus2 target 0 lun 0 (pass1,ada1)
at scbus3 target 0 lun 0 (pass2,ada2)
at scbus4 target 0 lun 0 (pass3,ada3)
at scbus7 target 0 lun 0 (pass4,cd0)
at scbus8 target 0 lun 0 (pass5,cd1)
# _
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On 5/29/2012 12:26 PM, Kees Jan Koster wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine.

Assuming you're using the default scheduler (SCHED_ULE), try switching
to the 4BSD scheduler in your kernel config file and see if that helps.

Doug

--

This .signature sanitized for your protection
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".

You may want to play around with gshed, the GEOM Scheduler.

Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
order to satisfy write threads (or the other way around?). But,
adding gsched into the mix helped things immensely, allowing mixed
reads/writes to better shares disk I/O resources.

I'll see if I can dig up a link to his testing e-mail messages.

--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:34 PM, Freddie Cash <> wrote:
> On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
>> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>>
>> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>>
>> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>>
>> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>>
>> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>>
>> All file systems are mounted "noatime, soft-updates".
>
> You may want to play around with gshed, the GEOM Scheduler.
>
> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
> order to satisfy write threads (or the other way around?).  But,
> adding gsched into the mix helped things immensely, allowing mixed
> reads/writes to better shares disk I/O resources.
>
> I'll see if I can dig up a link to his testing e-mail messages.

Here's the post, part of a thread on benchmarking RAID controllers:

http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 09:26:32PM +0200, Kees Jan Koster wrote:
> Dear All,
>
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".
>
> Details:
>
> # uname -a
> FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
> # gstat -f 'ada[0-3]$' -b
> dT: 1.001s w: 1.000s filter: ada[0-3]$
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> 0 0 0 0 0.0 0 0 0.0 0.0 ada0
> 0 0 0 0 0.0 0 0 0.0 0.0 ada1
> 0 0 0 0 0.0 0 0 0.0 0.0 ada2
> 103 273 0 0 0.0 273 34630 2062 121.9 ada3
> # camcontrol devlist
> at scbus1 target 0 lun 0 (pass0,ada0)
> at scbus2 target 0 lun 0 (pass1,ada1)
> at scbus3 target 0 lun 0 (pass2,ada2)
> at scbus4 target 0 lun 0 (pass3,ada3)
> at scbus7 target 0 lun 0 (pass4,cd0)
> at scbus8 target 0 lun 0 (pass5,cd1)


Check the SSD for its internal block size and make sure your filesystem
and partitions are aligned with the disk block size. Unless there
is something wrong with your SATA controller I'd expect a lot more than
273 IOPS/sec and ~30MByte/sec from a SSD.

Regards,

Gary
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

>> You may want to play around with gshed, the GEOM Scheduler.
>>
>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>> order to satisfy write threads (or the other way around?). But,
>> adding gsched into the mix helped things immensely, allowing mixed
>> reads/writes to better shares disk I/O resources.
>>
>> I'll see if I can dig up a link to his testing e-mail messages.
>
> Here's the post, part of a thread on benchmarking RAID controllers:
>
> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


*cloink* /me goes to pick my jam off of the floor. I can insert a I/O scheduler in full flight? Ok. I need to adjust my mental image of the world a bit.

I just played with the examples on a test machine and the effect is quite visible. I ran a CVS checkout of the ports collection concurrent with dd writing a massive file. Insert scheduler -> CVS update is faster; destroy scheduler -> CVS update crawls. This is so easy it's almost scary.

The behaviour that Matt describes is what I thought I was seeing too: write a *lot* and it becomes hard to read from the disk. In my system, writing data is largely asynchronous and can lag the actual arrival of data by as much as a few minutes. Reads are always synchronous to a user request and need to be served asap. Some writes are database writes and they should be services quickly too.

This is definitively something I need to look into. Thank you for the reference.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)

  #7  
29-05-2012 09:55 PM
Freebsd-stable member admin is online now
User
 

Dear All,

I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.

In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.

To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.

Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?

I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?

All file systems are mounted "noatime, soft-updates".

Details:

# uname -a
FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
# gstat -f 'ada[0-3]$' -b
dT: 1.001s w: 1.000s filter: ada[0-3]$
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0 ada0
0 0 0 0 0.0 0 0 0.0 0.0 ada1
0 0 0 0 0.0 0 0 0.0 0.0 ada2
103 273 0 0 0.0 273 34630 2062 121.9 ada3
# camcontrol devlist
at scbus1 target 0 lun 0 (pass0,ada0)
at scbus2 target 0 lun 0 (pass1,ada1)
at scbus3 target 0 lun 0 (pass2,ada2)
at scbus4 target 0 lun 0 (pass3,ada3)
at scbus7 target 0 lun 0 (pass4,cd0)
at scbus8 target 0 lun 0 (pass5,cd1)
# _
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On 5/29/2012 12:26 PM, Kees Jan Koster wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine.

Assuming you're using the default scheduler (SCHED_ULE), try switching
to the 4BSD scheduler in your kernel config file and see if that helps.

Doug

--

This .signature sanitized for your protection
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".

You may want to play around with gshed, the GEOM Scheduler.

Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
order to satisfy write threads (or the other way around?). But,
adding gsched into the mix helped things immensely, allowing mixed
reads/writes to better shares disk I/O resources.

I'll see if I can dig up a link to his testing e-mail messages.

--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:34 PM, Freddie Cash <> wrote:
> On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
>> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>>
>> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>>
>> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>>
>> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>>
>> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>>
>> All file systems are mounted "noatime, soft-updates".
>
> You may want to play around with gshed, the GEOM Scheduler.
>
> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
> order to satisfy write threads (or the other way around?).  But,
> adding gsched into the mix helped things immensely, allowing mixed
> reads/writes to better shares disk I/O resources.
>
> I'll see if I can dig up a link to his testing e-mail messages.

Here's the post, part of a thread on benchmarking RAID controllers:

http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 09:26:32PM +0200, Kees Jan Koster wrote:
> Dear All,
>
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".
>
> Details:
>
> # uname -a
> FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
> # gstat -f 'ada[0-3]$' -b
> dT: 1.001s w: 1.000s filter: ada[0-3]$
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> 0 0 0 0 0.0 0 0 0.0 0.0 ada0
> 0 0 0 0 0.0 0 0 0.0 0.0 ada1
> 0 0 0 0 0.0 0 0 0.0 0.0 ada2
> 103 273 0 0 0.0 273 34630 2062 121.9 ada3
> # camcontrol devlist
> at scbus1 target 0 lun 0 (pass0,ada0)
> at scbus2 target 0 lun 0 (pass1,ada1)
> at scbus3 target 0 lun 0 (pass2,ada2)
> at scbus4 target 0 lun 0 (pass3,ada3)
> at scbus7 target 0 lun 0 (pass4,cd0)
> at scbus8 target 0 lun 0 (pass5,cd1)


Check the SSD for its internal block size and make sure your filesystem
and partitions are aligned with the disk block size. Unless there
is something wrong with your SATA controller I'd expect a lot more than
273 IOPS/sec and ~30MByte/sec from a SSD.

Regards,

Gary
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

>> You may want to play around with gshed, the GEOM Scheduler.
>>
>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>> order to satisfy write threads (or the other way around?). But,
>> adding gsched into the mix helped things immensely, allowing mixed
>> reads/writes to better shares disk I/O resources.
>>
>> I'll see if I can dig up a link to his testing e-mail messages.
>
> Here's the post, part of a thread on benchmarking RAID controllers:
>
> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


*cloink* /me goes to pick my jam off of the floor. I can insert a I/O scheduler in full flight? Ok. I need to adjust my mental image of the world a bit.

I just played with the examples on a test machine and the effect is quite visible. I ran a CVS checkout of the ports collection concurrent with dd writing a massive file. Insert scheduler -> CVS update is faster; destroy scheduler -> CVS update crawls. This is so easy it's almost scary.

The behaviour that Matt describes is what I thought I was seeing too: write a *lot* and it becomes hard to read from the disk. In my system, writing data is largely asynchronous and can lag the actual arrival of data by as much as a few minutes. Reads are always synchronous to a user request and need to be served asap. Some writes are database writes and they should be services quickly too.

This is definitively something I need to look into. Thank you for the reference.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Doug,

>> I seem to have a problem where really heavy disk I/O is drowning my machine.
>
> Assuming you're using the default scheduler (SCHED_ULE), try switching
> to the 4BSD scheduler in your kernel config file and see if that helps.


I will, thanks for the suggestion.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)

  #8  
29-05-2012 09:59 PM
Freebsd-stable member admin is online now
User
 

Dear All,

I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.

In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.

To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.

Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?

I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?

All file systems are mounted "noatime, soft-updates".

Details:

# uname -a
FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
# gstat -f 'ada[0-3]$' -b
dT: 1.001s w: 1.000s filter: ada[0-3]$
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0 ada0
0 0 0 0 0.0 0 0 0.0 0.0 ada1
0 0 0 0 0.0 0 0 0.0 0.0 ada2
103 273 0 0 0.0 273 34630 2062 121.9 ada3
# camcontrol devlist
at scbus1 target 0 lun 0 (pass0,ada0)
at scbus2 target 0 lun 0 (pass1,ada1)
at scbus3 target 0 lun 0 (pass2,ada2)
at scbus4 target 0 lun 0 (pass3,ada3)
at scbus7 target 0 lun 0 (pass4,cd0)
at scbus8 target 0 lun 0 (pass5,cd1)
# _
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On 5/29/2012 12:26 PM, Kees Jan Koster wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine.

Assuming you're using the default scheduler (SCHED_ULE), try switching
to the 4BSD scheduler in your kernel config file and see if that helps.

Doug

--

This .signature sanitized for your protection
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".

You may want to play around with gshed, the GEOM Scheduler.

Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
order to satisfy write threads (or the other way around?). But,
adding gsched into the mix helped things immensely, allowing mixed
reads/writes to better shares disk I/O resources.

I'll see if I can dig up a link to his testing e-mail messages.

--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:34 PM, Freddie Cash <> wrote:
> On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
>> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>>
>> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>>
>> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>>
>> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>>
>> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>>
>> All file systems are mounted "noatime, soft-updates".
>
> You may want to play around with gshed, the GEOM Scheduler.
>
> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
> order to satisfy write threads (or the other way around?).  But,
> adding gsched into the mix helped things immensely, allowing mixed
> reads/writes to better shares disk I/O resources.
>
> I'll see if I can dig up a link to his testing e-mail messages.

Here's the post, part of a thread on benchmarking RAID controllers:

http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 09:26:32PM +0200, Kees Jan Koster wrote:
> Dear All,
>
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".
>
> Details:
>
> # uname -a
> FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
> # gstat -f 'ada[0-3]$' -b
> dT: 1.001s w: 1.000s filter: ada[0-3]$
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> 0 0 0 0 0.0 0 0 0.0 0.0 ada0
> 0 0 0 0 0.0 0 0 0.0 0.0 ada1
> 0 0 0 0 0.0 0 0 0.0 0.0 ada2
> 103 273 0 0 0.0 273 34630 2062 121.9 ada3
> # camcontrol devlist
> at scbus1 target 0 lun 0 (pass0,ada0)
> at scbus2 target 0 lun 0 (pass1,ada1)
> at scbus3 target 0 lun 0 (pass2,ada2)
> at scbus4 target 0 lun 0 (pass3,ada3)
> at scbus7 target 0 lun 0 (pass4,cd0)
> at scbus8 target 0 lun 0 (pass5,cd1)


Check the SSD for its internal block size and make sure your filesystem
and partitions are aligned with the disk block size. Unless there
is something wrong with your SATA controller I'd expect a lot more than
273 IOPS/sec and ~30MByte/sec from a SSD.

Regards,

Gary
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

>> You may want to play around with gshed, the GEOM Scheduler.
>>
>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>> order to satisfy write threads (or the other way around?). But,
>> adding gsched into the mix helped things immensely, allowing mixed
>> reads/writes to better shares disk I/O resources.
>>
>> I'll see if I can dig up a link to his testing e-mail messages.
>
> Here's the post, part of a thread on benchmarking RAID controllers:
>
> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


*cloink* /me goes to pick my jam off of the floor. I can insert a I/O scheduler in full flight? Ok. I need to adjust my mental image of the world a bit.

I just played with the examples on a test machine and the effect is quite visible. I ran a CVS checkout of the ports collection concurrent with dd writing a massive file. Insert scheduler -> CVS update is faster; destroy scheduler -> CVS update crawls. This is so easy it's almost scary.

The behaviour that Matt describes is what I thought I was seeing too: write a *lot* and it becomes hard to read from the disk. In my system, writing data is largely asynchronous and can lag the actual arrival of data by as much as a few minutes. Reads are always synchronous to a user request and need to be served asap. Some writes are database writes and they should be services quickly too.

This is definitively something I need to look into. Thank you for the reference.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Doug,

>> I seem to have a problem where really heavy disk I/O is drowning my machine.
>
> Assuming you're using the default scheduler (SCHED_ULE), try switching
> to the 4BSD scheduler in your kernel config file and see if that helps.


I will, thanks for the suggestion.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Gary,

>> # camcontrol devlist
>> at scbus1 target 0 lun 0 (pass0,ada0)
>> at scbus2 target 0 lun 0 (pass1,ada1)
>> at scbus3 target 0 lun 0 (pass2,ada2)
>> at scbus4 target 0 lun 0 (pass3,ada3)
>> at scbus7 target 0 lun 0 (pass4,cd0)
>> at scbus8 target 0 lun 0 (pass5,cd1)
>
> Check the SSD for its internal block size and make sure your filesystem
> and partitions are aligned with the disk block size. Unless there
> is something wrong with your SATA controller I'd expect a lot more than
> 273 IOPS/sec and ~30MByte/sec from a SSD.


Thank you for suggesting this. However, I recently went through my file systems to fix disk alignment. I ended up aligning them to 1M blocks, which raised the throughput from 6M/s to about 60-80MB/s which is what I am seeing today.

# gpart show
...
=> 34 250069613 ada3 GPT (119G)
34 2014 - free - (1M)
2048 250067599 1 freebsd-ufs (119G)

Do you think I need to revisit alignment?

--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)

  #9  
29-05-2012 10:12 PM
Freebsd-stable member admin is online now
User
 

Dear All,

I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.

In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.

To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.

Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?

I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?

All file systems are mounted "noatime, soft-updates".

Details:

# uname -a
FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
# gstat -f 'ada[0-3]$' -b
dT: 1.001s w: 1.000s filter: ada[0-3]$
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0 ada0
0 0 0 0 0.0 0 0 0.0 0.0 ada1
0 0 0 0 0.0 0 0 0.0 0.0 ada2
103 273 0 0 0.0 273 34630 2062 121.9 ada3
# camcontrol devlist
at scbus1 target 0 lun 0 (pass0,ada0)
at scbus2 target 0 lun 0 (pass1,ada1)
at scbus3 target 0 lun 0 (pass2,ada2)
at scbus4 target 0 lun 0 (pass3,ada3)
at scbus7 target 0 lun 0 (pass4,cd0)
at scbus8 target 0 lun 0 (pass5,cd1)
# _
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On 5/29/2012 12:26 PM, Kees Jan Koster wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine.

Assuming you're using the default scheduler (SCHED_ULE), try switching
to the 4BSD scheduler in your kernel config file and see if that helps.

Doug

--

This .signature sanitized for your protection
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".

You may want to play around with gshed, the GEOM Scheduler.

Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
order to satisfy write threads (or the other way around?). But,
adding gsched into the mix helped things immensely, allowing mixed
reads/writes to better shares disk I/O resources.

I'll see if I can dig up a link to his testing e-mail messages.

--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:34 PM, Freddie Cash <> wrote:
> On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
>> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>>
>> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>>
>> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>>
>> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>>
>> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>>
>> All file systems are mounted "noatime, soft-updates".
>
> You may want to play around with gshed, the GEOM Scheduler.
>
> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
> order to satisfy write threads (or the other way around?).  But,
> adding gsched into the mix helped things immensely, allowing mixed
> reads/writes to better shares disk I/O resources.
>
> I'll see if I can dig up a link to his testing e-mail messages.

Here's the post, part of a thread on benchmarking RAID controllers:

http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 09:26:32PM +0200, Kees Jan Koster wrote:
> Dear All,
>
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".
>
> Details:
>
> # uname -a
> FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
> # gstat -f 'ada[0-3]$' -b
> dT: 1.001s w: 1.000s filter: ada[0-3]$
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> 0 0 0 0 0.0 0 0 0.0 0.0 ada0
> 0 0 0 0 0.0 0 0 0.0 0.0 ada1
> 0 0 0 0 0.0 0 0 0.0 0.0 ada2
> 103 273 0 0 0.0 273 34630 2062 121.9 ada3
> # camcontrol devlist
> at scbus1 target 0 lun 0 (pass0,ada0)
> at scbus2 target 0 lun 0 (pass1,ada1)
> at scbus3 target 0 lun 0 (pass2,ada2)
> at scbus4 target 0 lun 0 (pass3,ada3)
> at scbus7 target 0 lun 0 (pass4,cd0)
> at scbus8 target 0 lun 0 (pass5,cd1)


Check the SSD for its internal block size and make sure your filesystem
and partitions are aligned with the disk block size. Unless there
is something wrong with your SATA controller I'd expect a lot more than
273 IOPS/sec and ~30MByte/sec from a SSD.

Regards,

Gary
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

>> You may want to play around with gshed, the GEOM Scheduler.
>>
>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>> order to satisfy write threads (or the other way around?). But,
>> adding gsched into the mix helped things immensely, allowing mixed
>> reads/writes to better shares disk I/O resources.
>>
>> I'll see if I can dig up a link to his testing e-mail messages.
>
> Here's the post, part of a thread on benchmarking RAID controllers:
>
> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


*cloink* /me goes to pick my jam off of the floor. I can insert a I/O scheduler in full flight? Ok. I need to adjust my mental image of the world a bit.

I just played with the examples on a test machine and the effect is quite visible. I ran a CVS checkout of the ports collection concurrent with dd writing a massive file. Insert scheduler -> CVS update is faster; destroy scheduler -> CVS update crawls. This is so easy it's almost scary.

The behaviour that Matt describes is what I thought I was seeing too: write a *lot* and it becomes hard to read from the disk. In my system, writing data is largely asynchronous and can lag the actual arrival of data by as much as a few minutes. Reads are always synchronous to a user request and need to be served asap. Some writes are database writes and they should be services quickly too.

This is definitively something I need to look into. Thank you for the reference.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Doug,

>> I seem to have a problem where really heavy disk I/O is drowning my machine.
>
> Assuming you're using the default scheduler (SCHED_ULE), try switching
> to the 4BSD scheduler in your kernel config file and see if that helps.


I will, thanks for the suggestion.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Gary,

>> # camcontrol devlist
>> at scbus1 target 0 lun 0 (pass0,ada0)
>> at scbus2 target 0 lun 0 (pass1,ada1)
>> at scbus3 target 0 lun 0 (pass2,ada2)
>> at scbus4 target 0 lun 0 (pass3,ada3)
>> at scbus7 target 0 lun 0 (pass4,cd0)
>> at scbus8 target 0 lun 0 (pass5,cd1)
>
> Check the SSD for its internal block size and make sure your filesystem
> and partitions are aligned with the disk block size. Unless there
> is something wrong with your SATA controller I'd expect a lot more than
> 273 IOPS/sec and ~30MByte/sec from a SSD.


Thank you for suggesting this. However, I recently went through my file systems to fix disk alignment. I ended up aligning them to 1M blocks, which raised the throughput from 6M/s to about 60-80MB/s which is what I am seeing today.

# gpart show
...
=> 34 250069613 ada3 GPT (119G)
34 2014 - free - (1M)
2048 250067599 1 freebsd-ufs (119G)

Do you think I need to revisit alignment?

--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

>> You may want to play around with gshed, the GEOM Scheduler.
>>
>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>> order to satisfy write threads (or the other way around?). But,
>> adding gsched into the mix helped things immensely, allowing mixed
>> reads/writes to better shares disk I/O resources.
>>
>> I'll see if I can dig up a link to his testing e-mail messages.
>
> Here's the post, part of a thread on benchmarking RAID controllers:
>
> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html

I looked at "sysctl kern.geom.confdot" (another ridiculously useful feature) to see where the scheduler should be placed.

The way I was thinking, I should place a scheduler in such a way that writes to one physical device (ada3 in my case) do not cause reads on another device to stall (e.g. ada2, where the database lives). However, it looks like the GEOM tree is actually a GEOM bush, with a separate tree for each device.

Am I missing something? Is there a way to schedule across devices? Is the bush a tree after all, maybe?
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)

  #10  
29-05-2012 10:24 PM
Freebsd-stable member admin is online now
User
 

Dear All,

I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.

In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.

To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.

Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?

I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?

All file systems are mounted "noatime, soft-updates".

Details:

# uname -a
FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
# gstat -f 'ada[0-3]$' -b
dT: 1.001s w: 1.000s filter: ada[0-3]$
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0 ada0
0 0 0 0 0.0 0 0 0.0 0.0 ada1
0 0 0 0 0.0 0 0 0.0 0.0 ada2
103 273 0 0 0.0 273 34630 2062 121.9 ada3
# camcontrol devlist
at scbus1 target 0 lun 0 (pass0,ada0)
at scbus2 target 0 lun 0 (pass1,ada1)
at scbus3 target 0 lun 0 (pass2,ada2)
at scbus4 target 0 lun 0 (pass3,ada3)
at scbus7 target 0 lun 0 (pass4,cd0)
at scbus8 target 0 lun 0 (pass5,cd1)
# _
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On 5/29/2012 12:26 PM, Kees Jan Koster wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine.

Assuming you're using the default scheduler (SCHED_ULE), try switching
to the 4BSD scheduler in your kernel config file and see if that helps.

Doug

--

This .signature sanitized for your protection
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".

You may want to play around with gshed, the GEOM Scheduler.

Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
order to satisfy write threads (or the other way around?). But,
adding gsched into the mix helped things immensely, allowing mixed
reads/writes to better shares disk I/O resources.

I'll see if I can dig up a link to his testing e-mail messages.

--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:34 PM, Freddie Cash <> wrote:
> On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
>> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>>
>> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>>
>> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>>
>> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>>
>> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>>
>> All file systems are mounted "noatime, soft-updates".
>
> You may want to play around with gshed, the GEOM Scheduler.
>
> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
> order to satisfy write threads (or the other way around?).  But,
> adding gsched into the mix helped things immensely, allowing mixed
> reads/writes to better shares disk I/O resources.
>
> I'll see if I can dig up a link to his testing e-mail messages.

Here's the post, part of a thread on benchmarking RAID controllers:

http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 09:26:32PM +0200, Kees Jan Koster wrote:
> Dear All,
>
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".
>
> Details:
>
> # uname -a
> FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
> # gstat -f 'ada[0-3]$' -b
> dT: 1.001s w: 1.000s filter: ada[0-3]$
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> 0 0 0 0 0.0 0 0 0.0 0.0 ada0
> 0 0 0 0 0.0 0 0 0.0 0.0 ada1
> 0 0 0 0 0.0 0 0 0.0 0.0 ada2
> 103 273 0 0 0.0 273 34630 2062 121.9 ada3
> # camcontrol devlist
> at scbus1 target 0 lun 0 (pass0,ada0)
> at scbus2 target 0 lun 0 (pass1,ada1)
> at scbus3 target 0 lun 0 (pass2,ada2)
> at scbus4 target 0 lun 0 (pass3,ada3)
> at scbus7 target 0 lun 0 (pass4,cd0)
> at scbus8 target 0 lun 0 (pass5,cd1)


Check the SSD for its internal block size and make sure your filesystem
and partitions are aligned with the disk block size. Unless there
is something wrong with your SATA controller I'd expect a lot more than
273 IOPS/sec and ~30MByte/sec from a SSD.

Regards,

Gary
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

>> You may want to play around with gshed, the GEOM Scheduler.
>>
>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>> order to satisfy write threads (or the other way around?). But,
>> adding gsched into the mix helped things immensely, allowing mixed
>> reads/writes to better shares disk I/O resources.
>>
>> I'll see if I can dig up a link to his testing e-mail messages.
>
> Here's the post, part of a thread on benchmarking RAID controllers:
>
> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


*cloink* /me goes to pick my jam off of the floor. I can insert a I/O scheduler in full flight? Ok. I need to adjust my mental image of the world a bit.

I just played with the examples on a test machine and the effect is quite visible. I ran a CVS checkout of the ports collection concurrent with dd writing a massive file. Insert scheduler -> CVS update is faster; destroy scheduler -> CVS update crawls. This is so easy it's almost scary.

The behaviour that Matt describes is what I thought I was seeing too: write a *lot* and it becomes hard to read from the disk. In my system, writing data is largely asynchronous and can lag the actual arrival of data by as much as a few minutes. Reads are always synchronous to a user request and need to be served asap. Some writes are database writes and they should be services quickly too.

This is definitively something I need to look into. Thank you for the reference.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Doug,

>> I seem to have a problem where really heavy disk I/O is drowning my machine.
>
> Assuming you're using the default scheduler (SCHED_ULE), try switching
> to the 4BSD scheduler in your kernel config file and see if that helps.


I will, thanks for the suggestion.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Gary,

>> # camcontrol devlist
>> at scbus1 target 0 lun 0 (pass0,ada0)
>> at scbus2 target 0 lun 0 (pass1,ada1)
>> at scbus3 target 0 lun 0 (pass2,ada2)
>> at scbus4 target 0 lun 0 (pass3,ada3)
>> at scbus7 target 0 lun 0 (pass4,cd0)
>> at scbus8 target 0 lun 0 (pass5,cd1)
>
> Check the SSD for its internal block size and make sure your filesystem
> and partitions are aligned with the disk block size. Unless there
> is something wrong with your SATA controller I'd expect a lot more than
> 273 IOPS/sec and ~30MByte/sec from a SSD.


Thank you for suggesting this. However, I recently went through my file systems to fix disk alignment. I ended up aligning them to 1M blocks, which raised the throughput from 6M/s to about 60-80MB/s which is what I am seeing today.

# gpart show
...
=> 34 250069613 ada3 GPT (119G)
34 2014 - free - (1M)
2048 250067599 1 freebsd-ufs (119G)

Do you think I need to revisit alignment?

--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

>> You may want to play around with gshed, the GEOM Scheduler.
>>
>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>> order to satisfy write threads (or the other way around?). But,
>> adding gsched into the mix helped things immensely, allowing mixed
>> reads/writes to better shares disk I/O resources.
>>
>> I'll see if I can dig up a link to his testing e-mail messages.
>
> Here's the post, part of a thread on benchmarking RAID controllers:
>
> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html

I looked at "sysctl kern.geom.confdot" (another ridiculously useful feature) to see where the scheduler should be placed.

The way I was thinking, I should place a scheduler in such a way that writes to one physical device (ada3 in my case) do not cause reads on another device to stall (e.g. ada2, where the database lives). However, it looks like the GEOM tree is actually a GEOM bush, with a separate tree for each device.

Am I missing something? Is there a way to schedule across devices? Is the bush a tree after all, maybe?
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 2:12 PM, Kees Jan Koster <> wrote:
>>> You may want to play around with gshed, the GEOM Scheduler.
>>>
>>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>>> order to satisfy write threads (or the other way around?).  But,
>>> adding gsched into the mix helped things immensely, allowing mixed
>>> reads/writes to better shares disk I/O resources.
>>>
>>> I'll see if I can dig up a link to his testing e-mail messages.
>>
>> Here's the post, part of a thread on benchmarking RAID controllers:
>>
>> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html
>
> I looked at "sysctl kern.geom.confdot" (another ridiculously useful feature) to see where the scheduler should be placed.
>
> The way I was thinking, I should place a scheduler in such a way that writes to one physical device (ada3 in my case) do not cause reads on another device to stall (e.g. ada2, where the database lives). However, it looks like the GEOM tree is actually a GEOM bush, with a separate tree for each device.
>
> Am I missing something? Is there a way to schedule across devices? Is the bush a tree after all, maybe?

There are others much better versed in the ways of GEOM than I, and
hopefully they will jump in to simplify/clarify things. :)

The way I understand things is that GEOM is a per-device stack of GEOM
classes, with the physical device at the bottom, and the VM/block/I/O
(?) system at the top. Thus, unless you use one of the multi-device
GEOM classes (graid, gmirror, gstripe, gvinum), then each stack is
independent of the others.

Meaning gsched only works for a single stack (ie, a single device).

Granted, I haven't played with gsched yet (most of our high-I/O
systems are ZFS), so there may be a way to use it across-GEOMs.
--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)

  #11  
29-05-2012 10:30 PM
Freebsd-stable member admin is online now
User
 

Dear All,

I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.

In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.

To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.

Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?

I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?

All file systems are mounted "noatime, soft-updates".

Details:

# uname -a
FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
# gstat -f 'ada[0-3]$' -b
dT: 1.001s w: 1.000s filter: ada[0-3]$
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0 ada0
0 0 0 0 0.0 0 0 0.0 0.0 ada1
0 0 0 0 0.0 0 0 0.0 0.0 ada2
103 273 0 0 0.0 273 34630 2062 121.9 ada3
# camcontrol devlist
at scbus1 target 0 lun 0 (pass0,ada0)
at scbus2 target 0 lun 0 (pass1,ada1)
at scbus3 target 0 lun 0 (pass2,ada2)
at scbus4 target 0 lun 0 (pass3,ada3)
at scbus7 target 0 lun 0 (pass4,cd0)
at scbus8 target 0 lun 0 (pass5,cd1)
# _
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On 5/29/2012 12:26 PM, Kees Jan Koster wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine.

Assuming you're using the default scheduler (SCHED_ULE), try switching
to the 4BSD scheduler in your kernel config file and see if that helps.

Doug

--

This .signature sanitized for your protection
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".

You may want to play around with gshed, the GEOM Scheduler.

Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
order to satisfy write threads (or the other way around?). But,
adding gsched into the mix helped things immensely, allowing mixed
reads/writes to better shares disk I/O resources.

I'll see if I can dig up a link to his testing e-mail messages.

--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:34 PM, Freddie Cash <> wrote:
> On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
>> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>>
>> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>>
>> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>>
>> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>>
>> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>>
>> All file systems are mounted "noatime, soft-updates".
>
> You may want to play around with gshed, the GEOM Scheduler.
>
> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
> order to satisfy write threads (or the other way around?).  But,
> adding gsched into the mix helped things immensely, allowing mixed
> reads/writes to better shares disk I/O resources.
>
> I'll see if I can dig up a link to his testing e-mail messages.

Here's the post, part of a thread on benchmarking RAID controllers:

http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 09:26:32PM +0200, Kees Jan Koster wrote:
> Dear All,
>
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".
>
> Details:
>
> # uname -a
> FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
> # gstat -f 'ada[0-3]$' -b
> dT: 1.001s w: 1.000s filter: ada[0-3]$
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> 0 0 0 0 0.0 0 0 0.0 0.0 ada0
> 0 0 0 0 0.0 0 0 0.0 0.0 ada1
> 0 0 0 0 0.0 0 0 0.0 0.0 ada2
> 103 273 0 0 0.0 273 34630 2062 121.9 ada3
> # camcontrol devlist
> at scbus1 target 0 lun 0 (pass0,ada0)
> at scbus2 target 0 lun 0 (pass1,ada1)
> at scbus3 target 0 lun 0 (pass2,ada2)
> at scbus4 target 0 lun 0 (pass3,ada3)
> at scbus7 target 0 lun 0 (pass4,cd0)
> at scbus8 target 0 lun 0 (pass5,cd1)


Check the SSD for its internal block size and make sure your filesystem
and partitions are aligned with the disk block size. Unless there
is something wrong with your SATA controller I'd expect a lot more than
273 IOPS/sec and ~30MByte/sec from a SSD.

Regards,

Gary
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

>> You may want to play around with gshed, the GEOM Scheduler.
>>
>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>> order to satisfy write threads (or the other way around?). But,
>> adding gsched into the mix helped things immensely, allowing mixed
>> reads/writes to better shares disk I/O resources.
>>
>> I'll see if I can dig up a link to his testing e-mail messages.
>
> Here's the post, part of a thread on benchmarking RAID controllers:
>
> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


*cloink* /me goes to pick my jam off of the floor. I can insert a I/O scheduler in full flight? Ok. I need to adjust my mental image of the world a bit.

I just played with the examples on a test machine and the effect is quite visible. I ran a CVS checkout of the ports collection concurrent with dd writing a massive file. Insert scheduler -> CVS update is faster; destroy scheduler -> CVS update crawls. This is so easy it's almost scary.

The behaviour that Matt describes is what I thought I was seeing too: write a *lot* and it becomes hard to read from the disk. In my system, writing data is largely asynchronous and can lag the actual arrival of data by as much as a few minutes. Reads are always synchronous to a user request and need to be served asap. Some writes are database writes and they should be services quickly too.

This is definitively something I need to look into. Thank you for the reference.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Doug,

>> I seem to have a problem where really heavy disk I/O is drowning my machine.
>
> Assuming you're using the default scheduler (SCHED_ULE), try switching
> to the 4BSD scheduler in your kernel config file and see if that helps.


I will, thanks for the suggestion.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Gary,

>> # camcontrol devlist
>> at scbus1 target 0 lun 0 (pass0,ada0)
>> at scbus2 target 0 lun 0 (pass1,ada1)
>> at scbus3 target 0 lun 0 (pass2,ada2)
>> at scbus4 target 0 lun 0 (pass3,ada3)
>> at scbus7 target 0 lun 0 (pass4,cd0)
>> at scbus8 target 0 lun 0 (pass5,cd1)
>
> Check the SSD for its internal block size and make sure your filesystem
> and partitions are aligned with the disk block size. Unless there
> is something wrong with your SATA controller I'd expect a lot more than
> 273 IOPS/sec and ~30MByte/sec from a SSD.


Thank you for suggesting this. However, I recently went through my file systems to fix disk alignment. I ended up aligning them to 1M blocks, which raised the throughput from 6M/s to about 60-80MB/s which is what I am seeing today.

# gpart show
...
=> 34 250069613 ada3 GPT (119G)
34 2014 - free - (1M)
2048 250067599 1 freebsd-ufs (119G)

Do you think I need to revisit alignment?

--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

>> You may want to play around with gshed, the GEOM Scheduler.
>>
>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>> order to satisfy write threads (or the other way around?). But,
>> adding gsched into the mix helped things immensely, allowing mixed
>> reads/writes to better shares disk I/O resources.
>>
>> I'll see if I can dig up a link to his testing e-mail messages.
>
> Here's the post, part of a thread on benchmarking RAID controllers:
>
> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html

I looked at "sysctl kern.geom.confdot" (another ridiculously useful feature) to see where the scheduler should be placed.

The way I was thinking, I should place a scheduler in such a way that writes to one physical device (ada3 in my case) do not cause reads on another device to stall (e.g. ada2, where the database lives). However, it looks like the GEOM tree is actually a GEOM bush, with a separate tree for each device.

Am I missing something? Is there a way to schedule across devices? Is the bush a tree after all, maybe?
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 2:12 PM, Kees Jan Koster <> wrote:
>>> You may want to play around with gshed, the GEOM Scheduler.
>>>
>>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>>> order to satisfy write threads (or the other way around?).  But,
>>> adding gsched into the mix helped things immensely, allowing mixed
>>> reads/writes to better shares disk I/O resources.
>>>
>>> I'll see if I can dig up a link to his testing e-mail messages.
>>
>> Here's the post, part of a thread on benchmarking RAID controllers:
>>
>> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html
>
> I looked at "sysctl kern.geom.confdot" (another ridiculously useful feature) to see where the scheduler should be placed.
>
> The way I was thinking, I should place a scheduler in such a way that writes to one physical device (ada3 in my case) do not cause reads on another device to stall (e.g. ada2, where the database lives). However, it looks like the GEOM tree is actually a GEOM bush, with a separate tree for each device.
>
> Am I missing something? Is there a way to schedule across devices? Is the bush a tree after all, maybe?

There are others much better versed in the ways of GEOM than I, and
hopefully they will jump in to simplify/clarify things. :)

The way I understand things is that GEOM is a per-device stack of GEOM
classes, with the physical device at the bottom, and the VM/block/I/O
(?) system at the top. Thus, unless you use one of the multi-device
GEOM classes (graid, gmirror, gstripe, gvinum), then each stack is
independent of the others.

Meaning gsched only works for a single stack (ie, a single device).

Granted, I haven't played with gsched yet (most of our high-I/O
systems are ZFS), so there may be a way to use it across-GEOMs.
--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

> Granted, I haven't played with gsched yet (most of our high-I/O
> systems are ZFS), so there may be a way to use it across-GEOMs.

From my previous experiments ZFS suffers the same fate when there is heavy write activity. Reads just don't get served in time.

How do you deal with that?
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)

  #12  
29-05-2012 10:39 PM
Freebsd-stable member admin is online now
User
 

Dear All,

I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.

In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.

To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.

Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?

I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?

All file systems are mounted "noatime, soft-updates".

Details:

# uname -a
FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
# gstat -f 'ada[0-3]$' -b
dT: 1.001s w: 1.000s filter: ada[0-3]$
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0 ada0
0 0 0 0 0.0 0 0 0.0 0.0 ada1
0 0 0 0 0.0 0 0 0.0 0.0 ada2
103 273 0 0 0.0 273 34630 2062 121.9 ada3
# camcontrol devlist
at scbus1 target 0 lun 0 (pass0,ada0)
at scbus2 target 0 lun 0 (pass1,ada1)
at scbus3 target 0 lun 0 (pass2,ada2)
at scbus4 target 0 lun 0 (pass3,ada3)
at scbus7 target 0 lun 0 (pass4,cd0)
at scbus8 target 0 lun 0 (pass5,cd1)
# _
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On 5/29/2012 12:26 PM, Kees Jan Koster wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine.

Assuming you're using the default scheduler (SCHED_ULE), try switching
to the 4BSD scheduler in your kernel config file and see if that helps.

Doug

--

This .signature sanitized for your protection
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".

You may want to play around with gshed, the GEOM Scheduler.

Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
order to satisfy write threads (or the other way around?). But,
adding gsched into the mix helped things immensely, allowing mixed
reads/writes to better shares disk I/O resources.

I'll see if I can dig up a link to his testing e-mail messages.

--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:34 PM, Freddie Cash <> wrote:
> On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
>> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>>
>> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>>
>> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>>
>> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>>
>> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>>
>> All file systems are mounted "noatime, soft-updates".
>
> You may want to play around with gshed, the GEOM Scheduler.
>
> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
> order to satisfy write threads (or the other way around?).  But,
> adding gsched into the mix helped things immensely, allowing mixed
> reads/writes to better shares disk I/O resources.
>
> I'll see if I can dig up a link to his testing e-mail messages.

Here's the post, part of a thread on benchmarking RAID controllers:

http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 09:26:32PM +0200, Kees Jan Koster wrote:
> Dear All,
>
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".
>
> Details:
>
> # uname -a
> FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
> # gstat -f 'ada[0-3]$' -b
> dT: 1.001s w: 1.000s filter: ada[0-3]$
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> 0 0 0 0 0.0 0 0 0.0 0.0 ada0
> 0 0 0 0 0.0 0 0 0.0 0.0 ada1
> 0 0 0 0 0.0 0 0 0.0 0.0 ada2
> 103 273 0 0 0.0 273 34630 2062 121.9 ada3
> # camcontrol devlist
> at scbus1 target 0 lun 0 (pass0,ada0)
> at scbus2 target 0 lun 0 (pass1,ada1)
> at scbus3 target 0 lun 0 (pass2,ada2)
> at scbus4 target 0 lun 0 (pass3,ada3)
> at scbus7 target 0 lun 0 (pass4,cd0)
> at scbus8 target 0 lun 0 (pass5,cd1)


Check the SSD for its internal block size and make sure your filesystem
and partitions are aligned with the disk block size. Unless there
is something wrong with your SATA controller I'd expect a lot more than
273 IOPS/sec and ~30MByte/sec from a SSD.

Regards,

Gary
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

>> You may want to play around with gshed, the GEOM Scheduler.
>>
>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>> order to satisfy write threads (or the other way around?). But,
>> adding gsched into the mix helped things immensely, allowing mixed
>> reads/writes to better shares disk I/O resources.
>>
>> I'll see if I can dig up a link to his testing e-mail messages.
>
> Here's the post, part of a thread on benchmarking RAID controllers:
>
> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


*cloink* /me goes to pick my jam off of the floor. I can insert a I/O scheduler in full flight? Ok. I need to adjust my mental image of the world a bit.

I just played with the examples on a test machine and the effect is quite visible. I ran a CVS checkout of the ports collection concurrent with dd writing a massive file. Insert scheduler -> CVS update is faster; destroy scheduler -> CVS update crawls. This is so easy it's almost scary.

The behaviour that Matt describes is what I thought I was seeing too: write a *lot* and it becomes hard to read from the disk. In my system, writing data is largely asynchronous and can lag the actual arrival of data by as much as a few minutes. Reads are always synchronous to a user request and need to be served asap. Some writes are database writes and they should be services quickly too.

This is definitively something I need to look into. Thank you for the reference.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Doug,

>> I seem to have a problem where really heavy disk I/O is drowning my machine.
>
> Assuming you're using the default scheduler (SCHED_ULE), try switching
> to the 4BSD scheduler in your kernel config file and see if that helps.


I will, thanks for the suggestion.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Gary,

>> # camcontrol devlist
>> at scbus1 target 0 lun 0 (pass0,ada0)
>> at scbus2 target 0 lun 0 (pass1,ada1)
>> at scbus3 target 0 lun 0 (pass2,ada2)
>> at scbus4 target 0 lun 0 (pass3,ada3)
>> at scbus7 target 0 lun 0 (pass4,cd0)
>> at scbus8 target 0 lun 0 (pass5,cd1)
>
> Check the SSD for its internal block size and make sure your filesystem
> and partitions are aligned with the disk block size. Unless there
> is something wrong with your SATA controller I'd expect a lot more than
> 273 IOPS/sec and ~30MByte/sec from a SSD.


Thank you for suggesting this. However, I recently went through my file systems to fix disk alignment. I ended up aligning them to 1M blocks, which raised the throughput from 6M/s to about 60-80MB/s which is what I am seeing today.

# gpart show
...
=> 34 250069613 ada3 GPT (119G)
34 2014 - free - (1M)
2048 250067599 1 freebsd-ufs (119G)

Do you think I need to revisit alignment?

--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

>> You may want to play around with gshed, the GEOM Scheduler.
>>
>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>> order to satisfy write threads (or the other way around?). But,
>> adding gsched into the mix helped things immensely, allowing mixed
>> reads/writes to better shares disk I/O resources.
>>
>> I'll see if I can dig up a link to his testing e-mail messages.
>
> Here's the post, part of a thread on benchmarking RAID controllers:
>
> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html

I looked at "sysctl kern.geom.confdot" (another ridiculously useful feature) to see where the scheduler should be placed.

The way I was thinking, I should place a scheduler in such a way that writes to one physical device (ada3 in my case) do not cause reads on another device to stall (e.g. ada2, where the database lives). However, it looks like the GEOM tree is actually a GEOM bush, with a separate tree for each device.

Am I missing something? Is there a way to schedule across devices? Is the bush a tree after all, maybe?
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 2:12 PM, Kees Jan Koster <> wrote:
>>> You may want to play around with gshed, the GEOM Scheduler.
>>>
>>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>>> order to satisfy write threads (or the other way around?).  But,
>>> adding gsched into the mix helped things immensely, allowing mixed
>>> reads/writes to better shares disk I/O resources.
>>>
>>> I'll see if I can dig up a link to his testing e-mail messages.
>>
>> Here's the post, part of a thread on benchmarking RAID controllers:
>>
>> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html
>
> I looked at "sysctl kern.geom.confdot" (another ridiculously useful feature) to see where the scheduler should be placed.
>
> The way I was thinking, I should place a scheduler in such a way that writes to one physical device (ada3 in my case) do not cause reads on another device to stall (e.g. ada2, where the database lives). However, it looks like the GEOM tree is actually a GEOM bush, with a separate tree for each device.
>
> Am I missing something? Is there a way to schedule across devices? Is the bush a tree after all, maybe?

There are others much better versed in the ways of GEOM than I, and
hopefully they will jump in to simplify/clarify things. :)

The way I understand things is that GEOM is a per-device stack of GEOM
classes, with the physical device at the bottom, and the VM/block/I/O
(?) system at the top. Thus, unless you use one of the multi-device
GEOM classes (graid, gmirror, gstripe, gvinum), then each stack is
independent of the others.

Meaning gsched only works for a single stack (ie, a single device).

Granted, I haven't played with gsched yet (most of our high-I/O
systems are ZFS), so there may be a way to use it across-GEOMs.
--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

> Granted, I haven't played with gsched yet (most of our high-I/O
> systems are ZFS), so there may be a way to use it across-GEOMs.

From my previous experiments ZFS suffers the same fate when there is heavy write activity. Reads just don't get served in time.

How do you deal with that?
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 2:30 PM, Kees Jan Koster <> wrote:
> Dear Freddie,
>
>> Granted, I haven't played with gsched yet (most of our high-I/O
>> systems are ZFS), so there may be a way to use it across-GEOMs.
>
> From my previous experiments ZFS suffers the same fate when there is heavy write activity. Reads just don't get served in time.
>
> How do you deal with that?

We're currently only using FreeBSD (and ZFS) on our backups servers.
The two main servers do rsync backups for ~150 remote Linux servers
and FreeBSD firewalls (1 server does the elementary and secondary
schools; the other server does the admin sites). Then they do zfs
sends to a third system off-site.

Thus, our workloads tend to be fairly one-sided (all reads on the zfs
send side; all writes on the zfs recv side; mostly reads on the rsync
side side with some writes). And, most of our working set fits into
ARC/L2ARC. Cache devices really help, as most reads come from the
L2ARC, while most writes go straight through to the pool.

We're still a year or so away from our ultimate goal of using
FreeBSD+ZFS+NFS to create a separate/proper SAN/NAS tier for our
virtual servers. At that point, we'll look a little deeper into
things, and experiment with different L2ARC/ZIL setups to optimise
read and write paths.


--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)

  #13  
30-05-2012 01:24 AM
Freebsd-stable member admin is online now
User
 

Dear All,

I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.

In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.

To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.

Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?

I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?

All file systems are mounted "noatime, soft-updates".

Details:

# uname -a
FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
# gstat -f 'ada[0-3]$' -b
dT: 1.001s w: 1.000s filter: ada[0-3]$
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0 ada0
0 0 0 0 0.0 0 0 0.0 0.0 ada1
0 0 0 0 0.0 0 0 0.0 0.0 ada2
103 273 0 0 0.0 273 34630 2062 121.9 ada3
# camcontrol devlist
at scbus1 target 0 lun 0 (pass0,ada0)
at scbus2 target 0 lun 0 (pass1,ada1)
at scbus3 target 0 lun 0 (pass2,ada2)
at scbus4 target 0 lun 0 (pass3,ada3)
at scbus7 target 0 lun 0 (pass4,cd0)
at scbus8 target 0 lun 0 (pass5,cd1)
# _
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On 5/29/2012 12:26 PM, Kees Jan Koster wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine.

Assuming you're using the default scheduler (SCHED_ULE), try switching
to the 4BSD scheduler in your kernel config file and see if that helps.

Doug

--

This .signature sanitized for your protection
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".

You may want to play around with gshed, the GEOM Scheduler.

Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
order to satisfy write threads (or the other way around?). But,
adding gsched into the mix helped things immensely, allowing mixed
reads/writes to better shares disk I/O resources.

I'll see if I can dig up a link to his testing e-mail messages.

--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:34 PM, Freddie Cash <> wrote:
> On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
>> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>>
>> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>>
>> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>>
>> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>>
>> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>>
>> All file systems are mounted "noatime, soft-updates".
>
> You may want to play around with gshed, the GEOM Scheduler.
>
> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
> order to satisfy write threads (or the other way around?).  But,
> adding gsched into the mix helped things immensely, allowing mixed
> reads/writes to better shares disk I/O resources.
>
> I'll see if I can dig up a link to his testing e-mail messages.

Here's the post, part of a thread on benchmarking RAID controllers:

http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 09:26:32PM +0200, Kees Jan Koster wrote:
> Dear All,
>
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".
>
> Details:
>
> # uname -a
> FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
> # gstat -f 'ada[0-3]$' -b
> dT: 1.001s w: 1.000s filter: ada[0-3]$
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> 0 0 0 0 0.0 0 0 0.0 0.0 ada0
> 0 0 0 0 0.0 0 0 0.0 0.0 ada1
> 0 0 0 0 0.0 0 0 0.0 0.0 ada2
> 103 273 0 0 0.0 273 34630 2062 121.9 ada3
> # camcontrol devlist
> at scbus1 target 0 lun 0 (pass0,ada0)
> at scbus2 target 0 lun 0 (pass1,ada1)
> at scbus3 target 0 lun 0 (pass2,ada2)
> at scbus4 target 0 lun 0 (pass3,ada3)
> at scbus7 target 0 lun 0 (pass4,cd0)
> at scbus8 target 0 lun 0 (pass5,cd1)


Check the SSD for its internal block size and make sure your filesystem
and partitions are aligned with the disk block size. Unless there
is something wrong with your SATA controller I'd expect a lot more than
273 IOPS/sec and ~30MByte/sec from a SSD.

Regards,

Gary
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

>> You may want to play around with gshed, the GEOM Scheduler.
>>
>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>> order to satisfy write threads (or the other way around?). But,
>> adding gsched into the mix helped things immensely, allowing mixed
>> reads/writes to better shares disk I/O resources.
>>
>> I'll see if I can dig up a link to his testing e-mail messages.
>
> Here's the post, part of a thread on benchmarking RAID controllers:
>
> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


*cloink* /me goes to pick my jam off of the floor. I can insert a I/O scheduler in full flight? Ok. I need to adjust my mental image of the world a bit.

I just played with the examples on a test machine and the effect is quite visible. I ran a CVS checkout of the ports collection concurrent with dd writing a massive file. Insert scheduler -> CVS update is faster; destroy scheduler -> CVS update crawls. This is so easy it's almost scary.

The behaviour that Matt describes is what I thought I was seeing too: write a *lot* and it becomes hard to read from the disk. In my system, writing data is largely asynchronous and can lag the actual arrival of data by as much as a few minutes. Reads are always synchronous to a user request and need to be served asap. Some writes are database writes and they should be services quickly too.

This is definitively something I need to look into. Thank you for the reference.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Doug,

>> I seem to have a problem where really heavy disk I/O is drowning my machine.
>
> Assuming you're using the default scheduler (SCHED_ULE), try switching
> to the 4BSD scheduler in your kernel config file and see if that helps.


I will, thanks for the suggestion.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Gary,

>> # camcontrol devlist
>> at scbus1 target 0 lun 0 (pass0,ada0)
>> at scbus2 target 0 lun 0 (pass1,ada1)
>> at scbus3 target 0 lun 0 (pass2,ada2)
>> at scbus4 target 0 lun 0 (pass3,ada3)
>> at scbus7 target 0 lun 0 (pass4,cd0)
>> at scbus8 target 0 lun 0 (pass5,cd1)
>
> Check the SSD for its internal block size and make sure your filesystem
> and partitions are aligned with the disk block size. Unless there
> is something wrong with your SATA controller I'd expect a lot more than
> 273 IOPS/sec and ~30MByte/sec from a SSD.


Thank you for suggesting this. However, I recently went through my file systems to fix disk alignment. I ended up aligning them to 1M blocks, which raised the throughput from 6M/s to about 60-80MB/s which is what I am seeing today.

# gpart show
...
=> 34 250069613 ada3 GPT (119G)
34 2014 - free - (1M)
2048 250067599 1 freebsd-ufs (119G)

Do you think I need to revisit alignment?

--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

>> You may want to play around with gshed, the GEOM Scheduler.
>>
>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>> order to satisfy write threads (or the other way around?). But,
>> adding gsched into the mix helped things immensely, allowing mixed
>> reads/writes to better shares disk I/O resources.
>>
>> I'll see if I can dig up a link to his testing e-mail messages.
>
> Here's the post, part of a thread on benchmarking RAID controllers:
>
> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html

I looked at "sysctl kern.geom.confdot" (another ridiculously useful feature) to see where the scheduler should be placed.

The way I was thinking, I should place a scheduler in such a way that writes to one physical device (ada3 in my case) do not cause reads on another device to stall (e.g. ada2, where the database lives). However, it looks like the GEOM tree is actually a GEOM bush, with a separate tree for each device.

Am I missing something? Is there a way to schedule across devices? Is the bush a tree after all, maybe?
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 2:12 PM, Kees Jan Koster <> wrote:
>>> You may want to play around with gshed, the GEOM Scheduler.
>>>
>>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>>> order to satisfy write threads (or the other way around?).  But,
>>> adding gsched into the mix helped things immensely, allowing mixed
>>> reads/writes to better shares disk I/O resources.
>>>
>>> I'll see if I can dig up a link to his testing e-mail messages.
>>
>> Here's the post, part of a thread on benchmarking RAID controllers:
>>
>> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html
>
> I looked at "sysctl kern.geom.confdot" (another ridiculously useful feature) to see where the scheduler should be placed.
>
> The way I was thinking, I should place a scheduler in such a way that writes to one physical device (ada3 in my case) do not cause reads on another device to stall (e.g. ada2, where the database lives). However, it looks like the GEOM tree is actually a GEOM bush, with a separate tree for each device.
>
> Am I missing something? Is there a way to schedule across devices? Is the bush a tree after all, maybe?

There are others much better versed in the ways of GEOM than I, and
hopefully they will jump in to simplify/clarify things. :)

The way I understand things is that GEOM is a per-device stack of GEOM
classes, with the physical device at the bottom, and the VM/block/I/O
(?) system at the top. Thus, unless you use one of the multi-device
GEOM classes (graid, gmirror, gstripe, gvinum), then each stack is
independent of the others.

Meaning gsched only works for a single stack (ie, a single device).

Granted, I haven't played with gsched yet (most of our high-I/O
systems are ZFS), so there may be a way to use it across-GEOMs.
--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

> Granted, I haven't played with gsched yet (most of our high-I/O
> systems are ZFS), so there may be a way to use it across-GEOMs.

From my previous experiments ZFS suffers the same fate when there is heavy write activity. Reads just don't get served in time.

How do you deal with that?
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 2:30 PM, Kees Jan Koster <> wrote:
> Dear Freddie,
>
>> Granted, I haven't played with gsched yet (most of our high-I/O
>> systems are ZFS), so there may be a way to use it across-GEOMs.
>
> From my previous experiments ZFS suffers the same fate when there is heavy write activity. Reads just don't get served in time.
>
> How do you deal with that?

We're currently only using FreeBSD (and ZFS) on our backups servers.
The two main servers do rsync backups for ~150 remote Linux servers
and FreeBSD firewalls (1 server does the elementary and secondary
schools; the other server does the admin sites). Then they do zfs
sends to a third system off-site.

Thus, our workloads tend to be fairly one-sided (all reads on the zfs
send side; all writes on the zfs recv side; mostly reads on the rsync
side side with some writes). And, most of our working set fits into
ARC/L2ARC. Cache devices really help, as most reads come from the
L2ARC, while most writes go straight through to the pool.

We're still a year or so away from our ultimate goal of using
FreeBSD+ZFS+NFS to create a separate/proper SAN/NAS tier for our
virtual servers. At that point, we'll look a little deeper into
things, and experiment with different L2ARC/ZIL setups to optimise
read and write paths.


--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 10:59:58PM +0200, Kees Jan Koster wrote:
> Dear Gary,
>
> >> # camcontrol devlist
> >> at scbus1 target 0 lun 0 (pass0,ada0)
> >> at scbus2 target 0 lun 0 (pass1,ada1)
> >> at scbus3 target 0 lun 0 (pass2,ada2)
> >> at scbus4 target 0 lun 0 (pass3,ada3)
> >> at scbus7 target 0 lun 0 (pass4,cd0)
> >> at scbus8 target 0 lun 0 (pass5,cd1)
> >
> > Check the SSD for its internal block size and make sure your filesystem
> > and partitions are aligned with the disk block size. Unless there
> > is something wrong with your SATA controller I'd expect a lot more than
> > 273 IOPS/sec and ~30MByte/sec from a SSD.
>
>
> Thank you for suggesting this. However, I recently went through my file systems to fix disk alignment. I ended up aligning them to 1M blocks, which raised the throughput from 6M/s to about 60-80MB/s which is what I am seeing today.
>
> # gpart show
> ...
> => 34 250069613 ada3 GPT (119G)
> 34 2014 - free - (1M)
> 2048 250067599 1 freebsd-ufs (119G)
>
> Do you think I need to revisit alignment?

I don't have the specific device you have, but looking at the test results
from a random site for the same drive and firmware, they got 465 random IOPS
for a 0.5KB block size and a lot more than 60-80MB/sec. I get 60-80MB/sec
from a WD green drive in a pure write situation (admitedly using ZFS),
so I'm a bit surprised you're seeing similar performance from your SSD,
although now I look at it, the drive appears to be an older model. It could
be that you're running into issues where the drive is working hard as
all the flash blocks need to be erased before reuse. You may get some
improvement if you tweak the filesystem block size to the SSD block size.
TRIM may also help if the drive supports it.

Regards,

Gary
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)

  #14  
10-06-2012 08:22 PM
Freebsd-stable member admin is online now
User
 

Dear All,

I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.

In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.

To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.

Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?

I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?

All file systems are mounted "noatime, soft-updates".

Details:

# uname -a
FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
# gstat -f 'ada[0-3]$' -b
dT: 1.001s w: 1.000s filter: ada[0-3]$
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0 ada0
0 0 0 0 0.0 0 0 0.0 0.0 ada1
0 0 0 0 0.0 0 0 0.0 0.0 ada2
103 273 0 0 0.0 273 34630 2062 121.9 ada3
# camcontrol devlist
at scbus1 target 0 lun 0 (pass0,ada0)
at scbus2 target 0 lun 0 (pass1,ada1)
at scbus3 target 0 lun 0 (pass2,ada2)
at scbus4 target 0 lun 0 (pass3,ada3)
at scbus7 target 0 lun 0 (pass4,cd0)
at scbus8 target 0 lun 0 (pass5,cd1)
# _
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On 5/29/2012 12:26 PM, Kees Jan Koster wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine.

Assuming you're using the default scheduler (SCHED_ULE), try switching
to the 4BSD scheduler in your kernel config file and see if that helps.

Doug

--

This .signature sanitized for your protection
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".

You may want to play around with gshed, the GEOM Scheduler.

Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
order to satisfy write threads (or the other way around?). But,
adding gsched into the mix helped things immensely, allowing mixed
reads/writes to better shares disk I/O resources.

I'll see if I can dig up a link to his testing e-mail messages.

--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 12:34 PM, Freddie Cash <> wrote:
> On Tue, May 29, 2012 at 12:26 PM, Kees Jan Koster <> wrote:
>> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>>
>> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>>
>> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>>
>> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>>
>> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>>
>> All file systems are mounted "noatime, soft-updates".
>
> You may want to play around with gshed, the GEOM Scheduler.
>
> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
> order to satisfy write threads (or the other way around?).  But,
> adding gsched into the mix helped things immensely, allowing mixed
> reads/writes to better shares disk I/O resources.
>
> I'll see if I can dig up a link to his testing e-mail messages.

Here's the post, part of a thread on benchmarking RAID controllers:

http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 09:26:32PM +0200, Kees Jan Koster wrote:
> Dear All,
>
> I seem to have a problem where really heavy disk I/O is drowning my machine. I see hangs in the shell where I am logged on using ssh. Network connections get dropped for no apparent reason and some HTTP requests are served really slowly. Profiling the app code shows that the hangs are in completely random places. Operations that are no more than a few lines of code apart suddenly take seconds to complete.
>
> In my search I seem to find that my machine is quite slow on the disk. I find that rather odd, given that the device in question is an SSD drive and it is a good bit faster than the WD drive that used to carry the data set that is accessed heavily. This drive is doing 1.5 times the throughput, but the hangs have not gone away.
>
> To clarify, the data set used to live on ada2 (see the devlist below) which is a spinning disk. When I experienced intermittent hangs I plugged in an SSD drive (ada3 on the devlist) and moved the data there. This improved the MB's per second that are being written (it is mostly-write data) but has not changed the hangs. If anything, they got worse since.
>
> Using gstat I notice that I/O service time is quite high. From the gstat below you can see that it takes just over 2s to servr the requests. The L(q) seems to never drop far below 100 and %busy hovers around 100% all day long. Can someone please help me troubleshoot that further? What can I do to make the underlying problem visible?
>
> I should mention all data is referenced through cross-mountpoint symlinks, would that make a difference? Should I use canonical paths in the code instead?
>
> All file systems are mounted "noatime, soft-updates".
>
> Details:
>
> # uname -a
> FreeBSD cumin.java-monitor.com 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Mar 26 14:30:19 UTC 2012 :/usr/obj/usr/src/sys/CUMIN amd64
> # gstat -f 'ada[0-3]$' -b
> dT: 1.001s w: 1.000s filter: ada[0-3]$
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> 0 0 0 0 0.0 0 0 0.0 0.0 ada0
> 0 0 0 0 0.0 0 0 0.0 0.0 ada1
> 0 0 0 0 0.0 0 0 0.0 0.0 ada2
> 103 273 0 0 0.0 273 34630 2062 121.9 ada3
> # camcontrol devlist
> at scbus1 target 0 lun 0 (pass0,ada0)
> at scbus2 target 0 lun 0 (pass1,ada1)
> at scbus3 target 0 lun 0 (pass2,ada2)
> at scbus4 target 0 lun 0 (pass3,ada3)
> at scbus7 target 0 lun 0 (pass4,cd0)
> at scbus8 target 0 lun 0 (pass5,cd1)


Check the SSD for its internal block size and make sure your filesystem
and partitions are aligned with the disk block size. Unless there
is something wrong with your SATA controller I'd expect a lot more than
273 IOPS/sec and ~30MByte/sec from a SSD.

Regards,

Gary
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

>> You may want to play around with gshed, the GEOM Scheduler.
>>
>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>> order to satisfy write threads (or the other way around?). But,
>> adding gsched into the mix helped things immensely, allowing mixed
>> reads/writes to better shares disk I/O resources.
>>
>> I'll see if I can dig up a link to his testing e-mail messages.
>
> Here's the post, part of a thread on benchmarking RAID controllers:
>
> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html


*cloink* /me goes to pick my jam off of the floor. I can insert a I/O scheduler in full flight? Ok. I need to adjust my mental image of the world a bit.

I just played with the examples on a test machine and the effect is quite visible. I ran a CVS checkout of the ports collection concurrent with dd writing a massive file. Insert scheduler -> CVS update is faster; destroy scheduler -> CVS update crawls. This is so easy it's almost scary.

The behaviour that Matt describes is what I thought I was seeing too: write a *lot* and it becomes hard to read from the disk. In my system, writing data is largely asynchronous and can lag the actual arrival of data by as much as a few minutes. Reads are always synchronous to a user request and need to be served asap. Some writes are database writes and they should be services quickly too.

This is definitively something I need to look into. Thank you for the reference.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Doug,

>> I seem to have a problem where really heavy disk I/O is drowning my machine.
>
> Assuming you're using the default scheduler (SCHED_ULE), try switching
> to the 4BSD scheduler in your kernel config file and see if that helps.


I will, thanks for the suggestion.
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Gary,

>> # camcontrol devlist
>> at scbus1 target 0 lun 0 (pass0,ada0)
>> at scbus2 target 0 lun 0 (pass1,ada1)
>> at scbus3 target 0 lun 0 (pass2,ada2)
>> at scbus4 target 0 lun 0 (pass3,ada3)
>> at scbus7 target 0 lun 0 (pass4,cd0)
>> at scbus8 target 0 lun 0 (pass5,cd1)
>
> Check the SSD for its internal block size and make sure your filesystem
> and partitions are aligned with the disk block size. Unless there
> is something wrong with your SATA controller I'd expect a lot more than
> 273 IOPS/sec and ~30MByte/sec from a SSD.


Thank you for suggesting this. However, I recently went through my file systems to fix disk alignment. I ended up aligning them to 1M blocks, which raised the throughput from 6M/s to about 60-80MB/s which is what I am seeing today.

# gpart show
...
=> 34 250069613 ada3 GPT (119G)
34 2014 - free - (1M)
2048 250067599 1 freebsd-ufs (119G)

Do you think I need to revisit alignment?

--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

>> You may want to play around with gshed, the GEOM Scheduler.
>>
>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>> order to satisfy write threads (or the other way around?). But,
>> adding gsched into the mix helped things immensely, allowing mixed
>> reads/writes to better shares disk I/O resources.
>>
>> I'll see if I can dig up a link to his testing e-mail messages.
>
> Here's the post, part of a thread on benchmarking RAID controllers:
>
> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html

I looked at "sysctl kern.geom.confdot" (another ridiculously useful feature) to see where the scheduler should be placed.

The way I was thinking, I should place a scheduler in such a way that writes to one physical device (ada3 in my case) do not cause reads on another device to stall (e.g. ada2, where the database lives). However, it looks like the GEOM tree is actually a GEOM bush, with a separate tree for each device.

Am I missing something? Is there a way to schedule across devices? Is the bush a tree after all, maybe?
--
Kees Jan

http://java-monitor.com/

+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 2:12 PM, Kees Jan Koster <> wrote:
>>> You may want to play around with gshed, the GEOM Scheduler.
>>>
>>> Matt Dillon did a bunch of tests comparing FreeBSD+UFS to
>>> DragonflyBSD+HAMMER and found that FreeBSD starves read threads in
>>> order to satisfy write threads (or the other way around?).  But,
>>> adding gsched into the mix helped things immensely, allowing mixed
>>> reads/writes to better shares disk I/O resources.
>>>
>>> I'll see if I can dig up a link to his testing e-mail messages.
>>
>> Here's the post, part of a thread on benchmarking RAID controllers:
>>
>> http://leaf.dragonflybsd.org/mailarchive/kernel/2011-07/msg00034.html
>
> I looked at "sysctl kern.geom.confdot" (another ridiculously useful feature) to see where the scheduler should be placed.
>
> The way I was thinking, I should place a scheduler in such a way that writes to one physical device (ada3 in my case) do not cause reads on another device to stall (e.g. ada2, where the database lives). However, it looks like the GEOM tree is actually a GEOM bush, with a separate tree for each device.
>
> Am I missing something? Is there a way to schedule across devices? Is the bush a tree after all, maybe?

There are others much better versed in the ways of GEOM than I, and
hopefully they will jump in to simplify/clarify things. :)

The way I understand things is that GEOM is a per-device stack of GEOM
classes, with the physical device at the bottom, and the VM/block/I/O
(?) system at the top. Thus, unless you use one of the multi-device
GEOM classes (graid, gmirror, gstripe, gvinum), then each stack is
independent of the others.

Meaning gsched only works for a single stack (ie, a single device).

Granted, I haven't played with gsched yet (most of our high-I/O
systems are ZFS), so there may be a way to use it across-GEOMs.
--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear Freddie,

> Granted, I haven't played with gsched yet (most of our high-I/O
> systems are ZFS), so there may be a way to use it across-GEOMs.

From my previous experiments ZFS suffers the same fate when there is heavy write activity. Reads just don't get served in time.

How do you deal with that?
--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 2:30 PM, Kees Jan Koster <> wrote:
> Dear Freddie,
>
>> Granted, I haven't played with gsched yet (most of our high-I/O
>> systems are ZFS), so there may be a way to use it across-GEOMs.
>
> From my previous experiments ZFS suffers the same fate when there is heavy write activity. Reads just don't get served in time.
>
> How do you deal with that?

We're currently only using FreeBSD (and ZFS) on our backups servers.
The two main servers do rsync backups for ~150 remote Linux servers
and FreeBSD firewalls (1 server does the elementary and secondary
schools; the other server does the admin sites). Then they do zfs
sends to a third system off-site.

Thus, our workloads tend to be fairly one-sided (all reads on the zfs
send side; all writes on the zfs recv side; mostly reads on the rsync
side side with some writes). And, most of our working set fits into
ARC/L2ARC. Cache devices really help, as most reads come from the
L2ARC, while most writes go straight through to the pool.

We're still a year or so away from our ultimate goal of using
FreeBSD+ZFS+NFS to create a separate/proper SAN/NAS tier for our
virtual servers. At that point, we'll look a little deeper into
things, and experiment with different L2ARC/ZIL setups to optimise
read and write paths.


--
Freddie Cash

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
On Tue, May 29, 2012 at 10:59:58PM +0200, Kees Jan Koster wrote:
> Dear Gary,
>
> >> # camcontrol devlist
> >> at scbus1 target 0 lun 0 (pass0,ada0)
> >> at scbus2 target 0 lun 0 (pass1,ada1)
> >> at scbus3 target 0 lun 0 (pass2,ada2)
> >> at scbus4 target 0 lun 0 (pass3,ada3)
> >> at scbus7 target 0 lun 0 (pass4,cd0)
> >> at scbus8 target 0 lun 0 (pass5,cd1)
> >
> > Check the SSD for its internal block size and make sure your filesystem
> > and partitions are aligned with the disk block size. Unless there
> > is something wrong with your SATA controller I'd expect a lot more than
> > 273 IOPS/sec and ~30MByte/sec from a SSD.
>
>
> Thank you for suggesting this. However, I recently went through my file systems to fix disk alignment. I ended up aligning them to 1M blocks, which raised the throughput from 6M/s to about 60-80MB/s which is what I am seeing today.
>
> # gpart show
> ...
> => 34 250069613 ada3 GPT (119G)
> 34 2014 - free - (1M)
> 2048 250067599 1 freebsd-ufs (119G)
>
> Do you think I need to revisit alignment?

I don't have the specific device you have, but looking at the test results
from a random site for the same drive and firmware, they got 465 random IOPS
for a 0.5KB block size and a lot more than 60-80MB/sec. I get 60-80MB/sec
from a WD green drive in a pure write situation (admitedly using ZFS),
so I'm a bit surprised you're seeing similar performance from your SSD,
although now I look at it, the drive appears to be an older model. It could
be that you're running into issues where the drive is working hard as
all the flash blocks need to be erased before reuse. You may get some
improvement if you tweak the filesystem block size to the SSD block size.
TRIM may also help if the drive supports it.

Regards,

Gary
_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)
Dear All,

Been a while since I worked on this and I thought I'd send out an update. I found out I had two related issues. Seemingly random hangs that seem to have their root in disk I/O and the other is that network connections are not being served quickly enough because of this.

For the latter issue, I learned that by raising kern.ipc.somaxconn I could make the system buffer the connections long enough so that the application could accept all of them.

The effect is that now my application runs smoothly again, although there are still lots of unexplained things about this system's I/O load.

Next steps for me are to move the code around a bit to change the way my application uses the disk. There is still some buffering I can do before writing and I can move a small part of the I/O off to another spindle. So while I am still not sure what is going on I will focus on my own code a bit before I return to tuning FreeBSD for this workload.

Thanks to all who contributed to this thread.

Kees Jan


On 30 May 2012, at 02:24, Gary Palmer wrote:

> On Tue, May 29, 2012 at 10:59:58PM +0200, Kees Jan Koster wrote:
>> Dear Gary,
>>
>>>> # camcontrol devlist
>>>> at scbus1 target 0 lun 0 (pass0,ada0)
>>>> at scbus2 target 0 lun 0 (pass1,ada1)
>>>> at scbus3 target 0 lun 0 (pass2,ada2)
>>>> at scbus4 target 0 lun 0 (pass3,ada3)
>>>> at scbus7 target 0 lun 0 (pass4,cd0)
>>>> at scbus8 target 0 lun 0 (pass5,cd1)
>>>
>>> Check the SSD for its internal block size and make sure your filesystem
>>> and partitions are aligned with the disk block size. Unless there
>>> is something wrong with your SATA controller I'd expect a lot more than
>>> 273 IOPS/sec and ~30MByte/sec from a SSD.
>>
>>
>> Thank you for suggesting this. However, I recently went through my file systems to fix disk alignment. I ended up aligning them to 1M blocks, which raised the throughput from 6M/s to about 60-80MB/s which is what I am seeing today.
>>
>> # gpart show
>> ...
>> => 34 250069613 ada3 GPT (119G)
>> 34 2014 - free - (1M)
>> 2048 250067599 1 freebsd-ufs (119G)
>>
>> Do you think I need to revisit alignment?
>
> I don't have the specific device you have, but looking at the test results
> from a random site for the same drive and firmware, they got 465 random IOPS
> for a 0.5KB block size and a lot more than 60-80MB/sec. I get 60-80MB/sec
> from a WD green drive in a pure write situation (admitedly using ZFS),
> so I'm a bit surprised you're seeing similar performance from your SSD,
> although now I look at it, the drive appears to be an older model. It could
> be that you're running into issues where the drive is working hard as
> all the flash blocks need to be erased before reuse. You may get some
> improvement if you tweak the filesystem block size to the SSD block size.
> TRIM may also help if the drive supports it.
>
> Regards,
>
> Gary


--
Kees Jan

http://java-monitor.com/

+31651838192

Change is good. Granted, it is good in retrospect, but change is good.

_______________________________________________
freebsd- mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-"
)





NewsArc Lists  |  Culture Pages   |  Computing Archive  |  Media-Pages
Link to this page on your blog or website by copying the HTML code below and pasting it into your site: