Discussion:
PCI Express card driver load and unload takes too much time ( up to 30 minutes)
Steevan Rodrigues
2018-09-21 04:02:35 UTC
Permalink
Hello Folks,

We have a PCI express card for data processing to achieve 25 to 30 Gbps .
Recently we have been facing a issue in one of the server at customer site.

System information:
Supermicro motherboard-X11DPH-TQ motherboard
hw.model: Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz ( Dual CPU with total
20 cores)
16 GB RAM

In this system the same PCIe card works fine in RHEL 7.5 and the driver
load and unload also works fine. Our driver usually takes about 20 seconds
to load and approx 20 seconds to unload.

However, on this particular system when FreeBSD ( tried with 11.1 and 11.2
Release) is used it takes about 2 to 4 minutes to load and about 8 to 30
minutes to unload.
During unload it looks like the system freezes completely and I can not run
any commands to find out what is happening .

The same driver works fine in our lab servers Dell T620 ( Xeon 12 core
CPUs) and desktops with FreeBSD 11.1 and 11.2 and 10.4 .

Also I ran couple of Phoronix tests on this SuperMicro server. In one
particular test I see that the time taken is too much.
The test suite is OsBench. In this, the thread creation test shows average
time taken to create threads is 9000 usec . On the other hand we have
servers and desktops in our lab and in that this same thread creation takes
only 20 to 30 usec .

Any pointers about what could be wrong with the system or our PCIe card
driver ?

Thanks
Steevan
Steevan Rodrigues
2018-10-04 09:50:21 UTC
Permalink
Hi ,

After some investigation I came across following email thread:
http://freebsd.1045724.x6.nabble.com/FreeBSD-11-1-contigfree-performance-issue-td6247284.html

Now I see that the issue seen by me is same as the one described in the
above email thread. We found that our driver too takes too much time during
configfree and contigmalloc calls. This happens only in server with xeon
dual CPU with 10 core each ( total 20 cores / 40 threads).
This issue is not seen in server with Xeon Dual CPU with 6 core each (
total 12 cores / 24 threads)

Any suggestions about how to overcome this problem with contigmalloc and
configfree ?
Thanks
Steevan
Post by Steevan Rodrigues
Hello Folks,
We have a PCI express card for data processing to achieve 25 to 30 Gbps .
Recently we have been facing a issue in one of the server at customer site.
Supermicro motherboard-X11DPH-TQ motherboard
20 cores)
16 GB RAM
In this system the same PCIe card works fine in RHEL 7.5 and the driver
load and unload also works fine. Our driver usually takes about 20 seconds
to load and approx 20 seconds to unload.
However, on this particular system when FreeBSD ( tried with 11.1 and
11.2 Release) is used it takes about 2 to 4 minutes to load and about 8 to
30 minutes to unload.
During unload it looks like the system freezes completely and I can not
run any commands to find out what is happening .
The same driver works fine in our lab servers Dell T620 ( Xeon 12 core
CPUs) and desktops with FreeBSD 11.1 and 11.2 and 10.4 .
Also I ran couple of Phoronix tests on this SuperMicro server. In one
particular test I see that the time taken is too much.
The test suite is OsBench. In this, the thread creation test shows average
time taken to create threads is 9000 usec . On the other hand we have
servers and desktops in our lab and in that this same thread creation takes
only 20 to 30 usec .
Any pointers about what could be wrong with the system or our PCIe card
driver ?
Thanks
Steevan
Ryan Stone
2018-10-05 18:00:59 UTC
Permalink
I'm not sure why configfree() would be taking a long time, but
configmalloc() can be extraordinarily expensive when it needs to
defragment memory to meet the request. Does your application really
require a lot of physically contiguous memory? if you can restructure
to not require contigmalloc() all -- maybe by using S/G DMA -- you may
find your life significantly easier.
Steevan Rodrigues
2018-10-27 06:52:00 UTC
Permalink
Yes , it is contigfree that is taking too much time. ( Contigfree is taking
almost 5x to 10x more time than contigmalloc )
However, I see this issue only on a SuperMicro Server which has Xeon Gold
CPU with 10 cores ( dual CPU i.e total 20 cores).

So I am wondering whether FreeBSD has performance issues on mutlicore (
more than 16 cores) servers ?
Has anyone come across issues like this?

Thanks
Steevan
Post by Ryan Stone
I'm not sure why configfree() would be taking a long time, but
configmalloc() can be extraordinarily expensive when it needs to
defragment memory to meet the request. Does your application really
require a lot of physically contiguous memory? if you can restructure
to not require contigmalloc() all -- maybe by using S/G DMA -- you may
find your life significantly easier.
Steevan Rodrigues
2018-11-19 12:48:59 UTC
Permalink
I tried FreeBSD 12 Beta 3 version on this server with Xeon Gold 5115 CPU.
All these problems have disappeared.

Actually contigfree was taking too much time on FreeBSD 11.x on my server
with Xeon Gold 5115 CPU.

In FreeBSD 12 Beta3 also contigfree still takes much more time compared to
contigmalloc.
However when I compare the values to FreeBSD 11.x number I can see huge
improvement in FreeBSD 12 Beta 3 .
Because of this contigfree issue my driver unload used to take 5 to 20
minutes in FreeBSD 11.x.
Now my driver takes only a few seconds to load and a few seconds to unload
in FreeBSD 12. BEta 3.

Hence it looks like the problem was specific FreeBSD 11.x.

Regards,
Steevan
Post by Steevan Rodrigues
Yes , it is contigfree that is taking too much time. ( Contigfree is
taking almost 5x to 10x more time than contigmalloc )
However, I see this issue only on a SuperMicro Server which has Xeon Gold
CPU with 10 cores ( dual CPU i.e total 20 cores).
So I am wondering whether FreeBSD has performance issues on mutlicore (
more than 16 cores) servers ?
Has anyone come across issues like this?
Thanks
Steevan
Post by Ryan Stone
I'm not sure why configfree() would be taking a long time, but
configmalloc() can be extraordinarily expensive when it needs to
defragment memory to meet the request. Does your application really
require a lot of physically contiguous memory? if you can restructure
to not require contigmalloc() all -- maybe by using S/G DMA -- you may
find your life significantly easier.
Loading...