Discussion:
Code with apache-2 on /usr/src
Adhemerval Zanella
2018-05-28 18:33:47 UTC
Permalink
Hi all,

I am evaluating adding the ARM optimized math routines [1] on FreeBSD msum
library and I would like to check which the policy regarding licensing on
/usr/src code. The optimized math code is licensed on Apache-2 and afaik
it should be compatible with FreeBSD BDS license, is there anything
impending to use the code as-is on msum? Or do I need to adjust and check
with author if the idea is re-license?

[1] https://github.com/ARM-software/optimized-routines
Konstantin Belousov
2018-05-28 19:04:44 UTC
Permalink
Post by Adhemerval Zanella
Hi all,
I am evaluating adding the ARM optimized math routines [1] on FreeBSD msum
library and I would like to check which the policy regarding licensing on
/usr/src code. The optimized math code is licensed on Apache-2 and afaik
it should be compatible with FreeBSD BDS license, is there anything
impending to use the code as-is on msum? Or do I need to adjust and check
with author if the idea is re-license?
[1] https://github.com/ARM-software/optimized-routines
Are you asking only about the license's compatibility, or is you intent
to contribute the ARM libm code to the FreeBSD project ?
Adhemerval Zanella
2018-05-28 19:08:49 UTC
Permalink
Post by Konstantin Belousov
Post by Adhemerval Zanella
Hi all,
I am evaluating adding the ARM optimized math routines [1] on FreeBSD msum
library and I would like to check which the policy regarding licensing on
/usr/src code. The optimized math code is licensed on Apache-2 and afaik
it should be compatible with FreeBSD BDS license, is there anything
impending to use the code as-is on msum? Or do I need to adjust and check
with author if the idea is re-license?
[1] https://github.com/ARM-software/optimized-routines
Are you asking only about the license's compatibility, or is you intent
to contribute the ARM libm code to the FreeBSD project ?
My intention is to contribute the ARM libm code to the FreeBSD msum library.
Steve Kargl
2018-05-28 19:35:06 UTC
Permalink
Post by Adhemerval Zanella
Post by Konstantin Belousov
Post by Adhemerval Zanella
Hi all,
I am evaluating adding the ARM optimized math routines [1] on FreeBSD msum
library and I would like to check which the policy regarding licensing on
/usr/src code. The optimized math code is licensed on Apache-2 and afaik
it should be compatible with FreeBSD BDS license, is there anything
impending to use the code as-is on msum? Or do I need to adjust and check
with author if the idea is re-license?
[1] https://github.com/ARM-software/optimized-routines
Are you asking only about the license's compatibility, or is you intent
to contribute the ARM libm code to the FreeBSD project ?
My intention is to contribute the ARM libm code to the FreeBSD msum library.
The above URL seems to contain only single precision code,
e.g., sinf(x). What benefit does this code have over the
current implementations of these functions? Doesn't ARM
support at least a double precision type? Why have an
algorithms for single precision that differ from the
algorithms at higher precision?
--
Steve
Adhemerval Zanella
2018-05-28 19:47:21 UTC
Permalink
Post by Steve Kargl
Post by Adhemerval Zanella
Post by Konstantin Belousov
Post by Adhemerval Zanella
Hi all,
I am evaluating adding the ARM optimized math routines [1] on FreeBSD msum
library and I would like to check which the policy regarding licensing on
/usr/src code. The optimized math code is licensed on Apache-2 and afaik
it should be compatible with FreeBSD BDS license, is there anything
impending to use the code as-is on msum? Or do I need to adjust and check
with author if the idea is re-license?
[1] https://github.com/ARM-software/optimized-routines
Are you asking only about the license's compatibility, or is you intent
to contribute the ARM libm code to the FreeBSD project ?
My intention is to contribute the ARM libm code to the FreeBSD msum library.
The above URL seems to contain only single precision code,
e.g., sinf(x). What benefit does this code have over the
current implementations of these functions? Doesn't ARM
support at least a double precision type?
Yes, the github repository only contains single precision implementation and
at the moment my idea is to contribute with expf, powf, logf, expf2, and
log2f. All these implementation are faster than current FreeBSD ones (I
plan to dig into with more details in patch proposal).
Post by Steve Kargl
Why have an
algorithms for single precision that differ from the
algorithms at higher precision?
Are you asking why use an implementation for single precision and another
for double and/or long double (if the case) or why to use a different
mathematical method for each one? For later, my understanding is exactly
precision requires different mathematical analysis for the latency and
error tradeoff. I do not have much experience on msum implementation for
double precision, neither I experimented used them for double precision
so I can' t really tell why it can' t be used for double precision as
well (my wild guess is these float implementation were not tuned for
double precision regarding error bounds).

As a side note, the same implementations were recent pushed on GLIBC [1]
and GLIBC also used similar implementation from Sun.

[1] https://www.gnu.org/software/libc/
Steve Kargl
2018-05-28 20:21:17 UTC
Permalink
Post by Adhemerval Zanella
Post by Steve Kargl
The above URL seems to contain only single precision code,
e.g., sinf(x). What benefit does this code have over the
current implementations of these functions? Doesn't ARM
support at least a double precision type?
Yes, the github repository only contains single precision implementation and
at the moment my idea is to contribute with expf, powf, logf, expf2, and
log2f. All these implementation are faster than current FreeBSD ones (I
plan to dig into with more details in patch proposal).
Post by Steve Kargl
Why have an
algorithms for single precision that differ from the
algorithms at higher precision?
Are you asking why use an implementation for single precision and another
for double and/or long double (if the case) or why to use a different
mathematical method for each one?
Your question don't make any sense to me. My question means that
if you only have ARM-specific single precision routines, then the
underlying algorithms for those SP routines will by definition be
different than the double and long double precision routines. One
can do for example 'diff -u s_sinf.c s_sin.c' while debugging.
The difference that one sees are usually restricted to different
numerical literal constants and the number of terms in polynomial
approximations.
--
Steve
Adhemerval Zanella
2018-05-28 20:47:09 UTC
Permalink
Post by Steve Kargl
Post by Adhemerval Zanella
Post by Steve Kargl
The above URL seems to contain only single precision code,
e.g., sinf(x). What benefit does this code have over the
current implementations of these functions? Doesn't ARM
support at least a double precision type?
Yes, the github repository only contains single precision implementation and
at the moment my idea is to contribute with expf, powf, logf, expf2, and
log2f. All these implementation are faster than current FreeBSD ones (I
plan to dig into with more details in patch proposal).
Post by Steve Kargl
Why have an
algorithms for single precision that differ from the
algorithms at higher precision?
Are you asking why use an implementation for single precision and another
for double and/or long double (if the case) or why to use a different
mathematical method for each one?
Your question don't make any sense to me. My question means that
if you only have ARM-specific single precision routines, then the
underlying algorithms for those SP routines will by definition be
different than the double and long double precision routines. One
can do for example 'diff -u s_sinf.c s_sin.c' while debugging.
The difference that one sees are usually restricted to different
numerical literal constants and the number of terms in polynomial
approximations.
Sorry if I was not clear, I did not fully get your question. Also for avoid
further misconceptions, this new implementation is *not* ARM-specific, but
rather a different one which is faster than current for FreeBSD (in fact
faster on x86 as well).

And is having a different algorithm for single and double prevision
a blocker for a future patch proposal?
Steve Kargl
2018-05-28 21:09:07 UTC
Permalink
Post by Adhemerval Zanella
Post by Steve Kargl
Post by Adhemerval Zanella
Post by Steve Kargl
The above URL seems to contain only single precision code,
e.g., sinf(x). What benefit does this code have over the
current implementations of these functions? Doesn't ARM
support at least a double precision type?
Yes, the github repository only contains single precision implementation and
at the moment my idea is to contribute with expf, powf, logf, expf2, and
log2f. All these implementation are faster than current FreeBSD ones (I
plan to dig into with more details in patch proposal).
Post by Steve Kargl
Why have an
algorithms for single precision that differ from the
algorithms at higher precision?
Are you asking why use an implementation for single precision and another
for double and/or long double (if the case) or why to use a different
mathematical method for each one?
Your question don't make any sense to me. My question means that
if you only have ARM-specific single precision routines, then the
underlying algorithms for those SP routines will by definition be
different than the double and long double precision routines. One
can do for example 'diff -u s_sinf.c s_sin.c' while debugging.
The difference that one sees are usually restricted to different
numerical literal constants and the number of terms in polynomial
approximations.
Sorry if I was not clear, I did not fully get your question. Also for avoid
further misconceptions, this new implementation is *not* ARM-specific, but
rather a different one which is faster than current for FreeBSD (in fact
faster on x86 as well).
And is having a different algorithm for single and double prevision
a blocker for a future patch proposal?
No. Given the comment in sinf.c that max ULP is 0.56072, I do note that
the current implementation of sinf in lib/msun is more accurate (for
interesting values of x). I also looked at single/s_sincosf.c. It is
rather dubious to have 80+ digit numerical constants for a float, which
at most has 9 relevant digits.
--
Steve
Adhemerval Zanella
2018-05-28 21:12:13 UTC
Permalink
Post by Steve Kargl
Post by Adhemerval Zanella
Post by Steve Kargl
Post by Adhemerval Zanella
Post by Steve Kargl
The above URL seems to contain only single precision code,
e.g., sinf(x). What benefit does this code have over the
current implementations of these functions? Doesn't ARM
support at least a double precision type?
Yes, the github repository only contains single precision implementation and
at the moment my idea is to contribute with expf, powf, logf, expf2, and
log2f. All these implementation are faster than current FreeBSD ones (I
plan to dig into with more details in patch proposal).
Post by Steve Kargl
Why have an
algorithms for single precision that differ from the
algorithms at higher precision?
Are you asking why use an implementation for single precision and another
for double and/or long double (if the case) or why to use a different
mathematical method for each one?
Your question don't make any sense to me. My question means that
if you only have ARM-specific single precision routines, then the
underlying algorithms for those SP routines will by definition be
different than the double and long double precision routines. One
can do for example 'diff -u s_sinf.c s_sin.c' while debugging.
The difference that one sees are usually restricted to different
numerical literal constants and the number of terms in polynomial
approximations.
Sorry if I was not clear, I did not fully get your question. Also for avoid
further misconceptions, this new implementation is *not* ARM-specific, but
rather a different one which is faster than current for FreeBSD (in fact
faster on x86 as well).
And is having a different algorithm for single and double prevision
a blocker for a future patch proposal?
No. Given the comment in sinf.c that max ULP is 0.56072, I do note that
the current implementation of sinf in lib/msun is more accurate (for
interesting values of x). I also looked at single/s_sincosf.c. It is
rather dubious to have 80+ digit numerical constants for a float, which
at most has 9 relevant digits.
Also keep in mind my initial idea is to propose patches only to expf, powf,
logf, expf2, and log2f.
Steve Kargl
2018-05-28 22:18:19 UTC
Permalink
Post by Adhemerval Zanella
Post by Steve Kargl
Post by Adhemerval Zanella
And is having a different algorithm for single and double prevision
a blocker for a future patch proposal?
No. Given the comment in sinf.c that max ULP is 0.56072, I do note that
the current implementation of sinf in lib/msun is more accurate (for
interesting values of x). I also looked at single/s_sincosf.c. It is
rather dubious to have 80+ digit numerical constants for a float, which
at most has 9 relevant digits.
Also keep in mind my initial idea is to propose patches only to expf, powf,
logf, expf2, and log2f.
OK, so I peeked at expf. Comment claims max ulp of 0.502.
Exhaustive testing for normal numbers in relevent range for
the current implementation of expf(x) shows

Interval tested: [-18,88.72]
ULP: 0.90951, x = -5.19804668e+00f, /* 0xc0a65666 */
flt = 5.52735012e-03f, /* 0x3bb51ec6 */
dbl = 5.5273505437686398e-03, /* 0x3f76a3d8, 0xdd1aae8e */

But, then one looks at implementation details. msun's current
implementation is written in terms of single precision; while
the routine you're suggesting is written in terms of double_t.
So, achieving 0.502 ULP is due to having 53-bits in intermediate
results. It appears that the algorithm of the suggested code
cannot easily be generalized to double and long double without
implementing a multiple-precision routines.

Note, years ago, I submitted implementations for expf, exp,
ld80/expl, ld128/expl, logf, log, ld80/logl, and ld128/logl
based on papers by PTP Tang [1,2]. My versions for single
and double precision were not adopted even though these had
better accuracy. Either Bruce Evans improved or with Bruce's
help I improved the ld80 and ld128 routines, which were added
to msun. I know Bruce fixed minor issues with the single
and double precision routines, but he has not submitted patches.

1. PTP Tang, "Table-driven implementation of the exponential
function in IEEE floating-point arithmetic," ACM Trans. Math.
Soft., 15, 144-157 (1989).

2. PTP Tang, "Table-driven implementation of the logarithm
function in IEEE floating-point arithmetic," ACM Trans. Math.
Soft., 16, 378-400 (1990).
--
Steve
Adhemerval Zanella
2018-05-29 12:37:07 UTC
Permalink
Post by Steve Kargl
Post by Adhemerval Zanella
Post by Steve Kargl
Post by Adhemerval Zanella
And is having a different algorithm for single and double prevision
a blocker for a future patch proposal?
No. Given the comment in sinf.c that max ULP is 0.56072, I do note that
the current implementation of sinf in lib/msun is more accurate (for
interesting values of x). I also looked at single/s_sincosf.c. It is
rather dubious to have 80+ digit numerical constants for a float, which
at most has 9 relevant digits.
Also keep in mind my initial idea is to propose patches only to expf, powf,
logf, expf2, and log2f.
OK, so I peeked at expf. Comment claims max ulp of 0.502.
Exhaustive testing for normal numbers in relevent range for
the current implementation of expf(x) shows
Interval tested: [-18,88.72]
ULP: 0.90951, x = -5.19804668e+00f, /* 0xc0a65666 */
flt = 5.52735012e-03f, /* 0x3bb51ec6 */
dbl = 5.5273505437686398e-03, /* 0x3f76a3d8, 0xdd1aae8e */
But, then one looks at implementation details. msun's current
implementation is written in terms of single precision; while
the routine you're suggesting is written in terms of double_t.
So, achieving 0.502 ULP is due to having 53-bits in intermediate
results. It appears that the algorithm of the suggested code
cannot easily be generalized to double and long double without
implementing a multiple-precision routines.
This is indeed true for the default implementation, although the same repo
has alternative implementation that uses only float for expf, powf, and
logf. However, as far as I could evaluated, the optimized expf and powf
single version does not yield any gain over current FreeBSD version, only
for the logf I see some gains.

Do you see any issue about current approach of using intermediary double_t
for internal calculations?
Post by Steve Kargl
Note, years ago, I submitted implementations for expf, exp,
ld80/expl, ld128/expl, logf, log, ld80/logl, and ld128/logl
based on papers by PTP Tang [1,2]. My versions for single
and double precision were not adopted even though these had
better accuracy. Either Bruce Evans improved or with Bruce's
help I improved the ld80 and ld128 routines, which were added
to msun. I know Bruce fixed minor issues with the single
and double precision routines, but he has not submitted patches.
1. PTP Tang, "Table-driven implementation of the exponential
function in IEEE floating-point arithmetic," ACM Trans. Math.
Soft., 15, 144-157 (1989).
2. PTP Tang, "Table-driven implementation of the logarithm
function in IEEE floating-point arithmetic," ACM Trans. Math.
Soft., 16, 378-400 (1990).
Thanks for the links, do you recall why exactly your implementations were
not adopted? Do you think a similar proposal based on the arm repo would
be also rejected?
Steve Kargl
2018-05-29 17:32:24 UTC
Permalink
Post by Adhemerval Zanella
Post by Steve Kargl
Post by Adhemerval Zanella
Post by Steve Kargl
Post by Adhemerval Zanella
And is having a different algorithm for single and double prevision
a blocker for a future patch proposal?
No. Given the comment in sinf.c that max ULP is 0.56072, I do note that
the current implementation of sinf in lib/msun is more accurate (for
interesting values of x). I also looked at single/s_sincosf.c. It is
rather dubious to have 80+ digit numerical constants for a float, which
at most has 9 relevant digits.
Also keep in mind my initial idea is to propose patches only to expf, powf,
logf, expf2, and log2f.
OK, so I peeked at expf. Comment claims max ulp of 0.502.
Exhaustive testing for normal numbers in relevent range for
the current implementation of expf(x) shows
Interval tested: [-18,88.72]
ULP: 0.90951, x = -5.19804668e+00f, /* 0xc0a65666 */
flt = 5.52735012e-03f, /* 0x3bb51ec6 */
dbl = 5.5273505437686398e-03, /* 0x3f76a3d8, 0xdd1aae8e */
But, then one looks at implementation details. msun's current
implementation is written in terms of single precision; while
the routine you're suggesting is written in terms of double_t.
So, achieving 0.502 ULP is due to having 53-bits in intermediate
results. It appears that the algorithm of the suggested code
cannot easily be generalized to double and long double without
implementing a multiple-precision routines.
This is indeed true for the default implementation, although the same repo
has alternative implementation that uses only float for expf, powf, and
logf. However, as far as I could evaluated, the optimized expf and powf
single version does not yield any gain over current FreeBSD version, only
for the logf I see some gains.
Do you see any issue about current approach of using intermediary double_t
for internal calculations?
No. The kernels for sinf and cosf (ie., k_sinf.c and k_cosf.c)
use double for its intermediate computations. But, the main
code in s_sin[fl].c and s_cos[f].c have the same internal structure:

1) Split argument into integer parts
2) Filter special values (+-inf, NaN)
3) Split into intervals
a) for small x no range reduction is needed.
b) do range reduction into [0,pi/4]
4) In (3a) deal with subnormal numbers with care to avoid spurious
underflow.
5) In (3b), use polynomial approximations.

Because the internal structure is similar for all precision, it
makes maintenance easier. For maintenance and the importance of
having the same structure, see the history of s_erff.c:

https://svnweb.freebsd.org/base/head/lib/msun/src/s_erff.c?view=log
Post by Adhemerval Zanella
Post by Steve Kargl
Note, years ago, I submitted implementations for expf, exp,
ld80/expl, ld128/expl, logf, log, ld80/logl, and ld128/logl
based on papers by PTP Tang [1,2]. My versions for single
and double precision were not adopted even though these had
better accuracy. Either Bruce Evans improved or with Bruce's
help I improved the ld80 and ld128 routines, which were added
to msun. I know Bruce fixed minor issues with the single
and double precision routines, but he has not submitted patches.
1. PTP Tang, "Table-driven implementation of the exponential
function in IEEE floating-point arithmetic," ACM Trans. Math.
Soft., 15, 144-157 (1989).
2. PTP Tang, "Table-driven implementation of the logarithm
function in IEEE floating-point arithmetic," ACM Trans. Math.
Soft., 16, 378-400 (1990).
Thanks for the links, do you recall why exactly your implementations were
not adopted? Do you think a similar proposal based on the arm repo would
be also rejected?
Mostly due to issues on my part. Bruce was/is the only person interested
in reviewing patches to libm. At the time I submitted that code, his
comments and suggestions could be characterized as drinking from a fire
hose. When I had a commit bit, I finally gave up on the pursuit of
perfect code and simply committed s_expl.c. Later, David Das committed
s_logl.c.
--
Steve
Adhemerval Zanella
2018-05-29 20:39:40 UTC
Permalink
Post by Steve Kargl
Post by Adhemerval Zanella
Post by Steve Kargl
Post by Adhemerval Zanella
Post by Steve Kargl
Post by Adhemerval Zanella
And is having a different algorithm for single and double prevision
a blocker for a future patch proposal?
No. Given the comment in sinf.c that max ULP is 0.56072, I do note that
the current implementation of sinf in lib/msun is more accurate (for
interesting values of x). I also looked at single/s_sincosf.c. It is
rather dubious to have 80+ digit numerical constants for a float, which
at most has 9 relevant digits.
Also keep in mind my initial idea is to propose patches only to expf, powf,
logf, expf2, and log2f.
OK, so I peeked at expf. Comment claims max ulp of 0.502.
Exhaustive testing for normal numbers in relevent range for
the current implementation of expf(x) shows
Interval tested: [-18,88.72]
ULP: 0.90951, x = -5.19804668e+00f, /* 0xc0a65666 */
flt = 5.52735012e-03f, /* 0x3bb51ec6 */
dbl = 5.5273505437686398e-03, /* 0x3f76a3d8, 0xdd1aae8e */
But, then one looks at implementation details. msun's current
implementation is written in terms of single precision; while
the routine you're suggesting is written in terms of double_t.
So, achieving 0.502 ULP is due to having 53-bits in intermediate
results. It appears that the algorithm of the suggested code
cannot easily be generalized to double and long double without
implementing a multiple-precision routines.
This is indeed true for the default implementation, although the same repo
has alternative implementation that uses only float for expf, powf, and
logf. However, as far as I could evaluated, the optimized expf and powf
single version does not yield any gain over current FreeBSD version, only
for the logf I see some gains.
Do you see any issue about current approach of using intermediary double_t
for internal calculations?
No. The kernels for sinf and cosf (ie., k_sinf.c and k_cosf.c)
use double for its intermediate computations. But, the main
1) Split argument into integer parts
2) Filter special values (+-inf, NaN)
3) Split into intervals
a) for small x no range reduction is needed.
b) do range reduction into [0,pi/4]
4) In (3a) deal with subnormal numbers with care to avoid spurious
underflow.
5) In (3b), use polynomial approximations.
Because the internal structure is similar for all precision, it
makes maintenance easier. For maintenance and the importance of
https://svnweb.freebsd.org/base/head/lib/msun/src/s_erff.c?view=log
Post by Adhemerval Zanella
Post by Steve Kargl
Note, years ago, I submitted implementations for expf, exp,
ld80/expl, ld128/expl, logf, log, ld80/logl, and ld128/logl
based on papers by PTP Tang [1,2]. My versions for single
and double precision were not adopted even though these had
better accuracy. Either Bruce Evans improved or with Bruce's
help I improved the ld80 and ld128 routines, which were added
to msun. I know Bruce fixed minor issues with the single
and double precision routines, but he has not submitted patches.
1. PTP Tang, "Table-driven implementation of the exponential
function in IEEE floating-point arithmetic," ACM Trans. Math.
Soft., 15, 144-157 (1989).
2. PTP Tang, "Table-driven implementation of the logarithm
function in IEEE floating-point arithmetic," ACM Trans. Math.
Soft., 16, 378-400 (1990).
Thanks for the links, do you recall why exactly your implementations were
not adopted? Do you think a similar proposal based on the arm repo would
be also rejected?
Mostly due to issues on my part. Bruce was/is the only person interested
in reviewing patches to libm. At the time I submitted that code, his
comments and suggestions could be characterized as drinking from a fire
hose. When I had a commit bit, I finally gave up on the pursuit of
perfect code and simply committed s_expl.c. Later, David Das committed
s_logl.c.
Thanks for the feedback so far, it was valuable. The only missing bit is
the original question, do you know if using apache-2 on /usr/src is allowed?
Steve Kargl
2018-05-29 20:53:12 UTC
Permalink
Post by Adhemerval Zanella
Thanks for the feedback so far, it was valuable. The only missing bit is
the original question, do you know if using apache-2 on /usr/src is allowed?
IANAL, and reading the Apache-2 license gives me a
headache. The question is better address to FreeBSD
Core team or the FreeBSD Foundation. My opinion
doesn't count for muchi around here, but I do think
the code would be allowed.
--
Steve
Warner Losh
2018-05-29 21:51:20 UTC
Permalink
On Tue, May 29, 2018 at 2:53 PM, Steve Kargl <
Post by Adhemerval Zanella
Post by Adhemerval Zanella
Thanks for the feedback so far, it was valuable. The only missing bit is
the original question, do you know if using apache-2 on /usr/src is
allowed?
IANAL, and reading the Apache-2 license gives me a
headache. The question is better address to FreeBSD
Core team or the FreeBSD Foundation. My opinion
doesn't count for muchi around here, but I do think
the code would be allowed.
Not the Foundation, but the Core team. The Core Team sets policy for the
project. The Foundation provides funding and advocacy. This is clearly a
policy call. It's not on the list right now.

Warner

Continue reading on narkive:
Loading...