> On Saturday 20 September 2014 18:21:44 John Ralls wrote:

>> On Aug 27, 2014, at 10:31 PM, John Ralls <

[hidden email]> wrote:

>>> On Aug 27, 2014, at 8:32 AM, Geert Janssens <janssens-

>

[hidden email]> wrote:

>>>> On Saturday 23 August 2014 18:01:15 John Ralls wrote:

>>>>> So, having gotten test-lots and all of the other tests working*

>>>>> with

>>>>> libmpdecimal, I studied the Intel library for several days and

>>>>> couldn't figure out how to make it work, so I decided to try the

>>>>> GCC

>>>>> implementation, which offers a 128-bit IEEE 754 format that's

>>>>> fixed

>>>>> size. Since it doesn't ever call malloc, I thought it might prove

>>>>> faster, and indeed it is. I haven't finished integrating it -- the

>>>>> library doesn't provide formatted printing -- but it's far enough

>>>>> along that it passes all of the engine and backend tests. Some

>>>>> results:

>>>>>

>>>>> test-numeric, with NREPS increased to 20000 to get a reasonable

>>>>> execution time for profiling: master 9645ms

>>>>>

>>>>> mpDecimal 21410ms

>>>>> decNumber 12985ms

>>>>>

>>>>> test-lots:

>>>>> master 16300ms

>>>>> mpDecimal 20203ms

>>>>> decNumber 19044ms

>>>>>

>>>>> The first shows the relative speed in more or less pure

>>>>> computation,

>>>>> the latter shows the overall impact on one of the longer-running

>>>>> tests that does a lot of other stuff.

>>>>

>>>> John,

>>>>

>>>> Thanks for implementing this and running the tests. The topic was

>>>> last touched before my holidays so it took me a while to refresh

>>>> my memory...

>>>>

>>>> decNumber clearly performs better, although both implementations

>>>> lag on our current gnc_numeric performance.>>

>>>>> I haven't investigated Christian's other suggestion of aggressive

>>>>> rounding to eliminate the overflow issue to make room for larger

>>>>> denominators, nor my original idea of replacing gnc_numeric with

>>>>> boost::rational atop a multi-precision class (either boost::mp or

>>>>> gmp).

>>>>

>>>> Do you still have plans for either ?

>>>>

>>>> I suppose aggressive rounding is orthogonal to the choice of data

>>>> type. Christian's argument that we should round as is expected in

>>>> the financial world makes sense to me but that argument does not

>>>> imply any underlying data type.

>>>>

>>>> How about the boost::rational option ?

>>>>

>>>>> I have noticed that we're doing some dumb things with Scheme,

>>>>> like using double as an intermediate when converting from Scheme

>>>>> numbers to gnc_numeric (Scheme numbers are also rational, so the

>>>>> conversion should be direct) and representing gnc_numerics as a

>>>>> tuple

>>>>> (num, denom) instead of just using Scheme rationals.

>>>>

>>>> Does this mean you see potential performance gains in this as we

>>>> clean up the C<->Scheme number conversions ?>>

>>>>> Neither will

>>>>> work for decimal floats, of course; the whole class will have to

>>>>> be

>>>>> wrapped so that computation takes place in C++.

>>>>

>>>> Which means some performance drop again...

>>>>

>>>>> Storage in SQL is

>>>>> also an issue,

>>>>

>>>> From the previous conversation I recall sqlite doesn't have a

>>>> decimal type so we can't run calculating queries on it directly.

>>>>

>>>> But how about the other two: mysql and postsgresql. Is the decimal

>>>> type you're using in your tests directly compatible with the

>>>> decimal data types in mysql and postgresql, or compatible enough

>>>> to convert automatically between them ?>>

>>>>> as is maintaining backward file compatibility.

>>>>>

>>>>> Another issue is equality: In order to get tests to pass I've had

>>>>> to

>>>>> implement a fuzzy comparison where both numbers are first rounded

>>>>> to

>>>>> the smaller number of decimal places -- 2 fewer if there are 12 or

>>>>> more -- and compared with two roundings, first truncation and

>>>>> second

>>>>> "bankers", and declared unequal only if they're unequal in both. I

>>>>> hate this, but it seems to be necessary to obtain equality when

>>>>> dealing with large divisors (as when computing prices or interest

>>>>> rates). I suspect that we'd have to do something similar if we

>>>>> pursue

>>>>> aggressive rounding to avoid overflows, but the only way to know

>>>>> for

>>>>> certain is to try.

>>>>

>>>> Ugh. :(

>>>>

>>>> So what's the current balance ?

>>>>

>>>> I see following pros and cons of your tests so far:

>>>>

>>>> Pro:

>>>> - using a decimal type gives us more precision

>>>>

>>>> Con:

>>>> - sqlite doesn't have a decimal data type, so as it currently

>>>> stands we can't run calculations in queries in that database type

>>>> - we loose backward/forward compatibility with earlier versions of

>>>> GnuCash - decNumber or mpDecimal are new dependencies

>>>> - their performance is currently less than the original gnc_numeric

>>>> - guile doesn't know of a decimal data type so we may need some

>>>> conversion glue - equality is fuzzy

>>>>

>>>> Please add if I forgot arguments on either side.

>>>>

>>>> Arguably many of the con arguments can be solved. That will effort

>>>> however. And I consider the first two more important than the

>>>> others.

>>>>

>>>> So do you think the benefits (I assume there will be more than the

>>>> one I mentioned) will outweigh the drawbacks ? Does the work that

>>>> will go into it bring GnuCash enough value to continue on this

>>>> track ?

>>>>

>>>> It's probably too early to tell for sure but I wanted to get your

>>>> ideas based on what we have so far.>

>>> Testing boost::rational is next on the agenda. My original idea was

>>> to use it with boost::multiprecision or gmp, but I'd prefer

>>> something that doesn't depend on heap allocations because it's so

>>> much slower than stack allocation and must be passed by pointer,

>>> which is a major change in the API -- meaning a ton of cleanup work

>>> up front. I think I'll do a straight substitution of the existing

>>> math128 with boost::rational<int64_t> just to see what happens.

>>>

>>> I think that part of implementing immediate rounding must include

>>> constraining denominators to powers-of-ten. The main reason is that

>>> it makes my head hurt when I try to think about how to do rounding

>>> with arbitrary denominators. If you consider that a big chunk of

>>> the overflow problems arise from denominators and divisors that are

>>> large primes, it becomes quickly apparent that avoiding large prime

>>> denominators might well resolve much of the problem. It's also true

>>> that for real-world numbers, as opposed to free random-generated

>>> numbers from tests, that all numbers have powers-of-ten

>>> denominators. We'd still have many-digit-prime divisors to deal

>>> with, but constraining denominators gives us something to round to.

>>> Does that make sense, or does it seem the rambling of a lunatic?

>>> This really does make my head hurt.

>> Boost::Rational is a serious disappointment. Boost::rational<int64_t>

>> didn’t allow a significant increase in precision and is further

>> hampered by not providing any overflow detection. Benchmarks of

>> test-numeric with NREPS set to 20000 (the numbers are a bit different

>> from before because I’m using my Mac Pro instead of my Mac Book Air,

>> and because these are debug builds):

>>

>> Branch Tests Time

>> master: 1187558 5346ms

>> libmpdecimal: 1180076 8718ms

>> boost-rational, cppint: 1187558 20903ms

>> boost-rational, gmp: 1187558 34232ms

>>

>> cppint means boost::multiprecision::checked_cppint128_t, a 16-byte

>> stack allocated multi-precision integer. “Checked” means that it

>> throws std::overflow_error instead of wrapping. Gmp means the Gnu

>> Multiprecision library. It’s supposed to be faster than cppint, but

>> its performance is killed by having to malloc everything. The fact

>> that our own C code is substantially faster than any library I’ve

>> tried is a tribute to Linas.

>>

>> There’s another wrinkle: Boost::Rational immediately reduces all

>> numbers to what we called in my grade school “simplest form”, meaning

>> no common factors between the numerator and denominator. This

>> actually helps prevent overflows, but means that we have to be very

>> careful to supply the SCU as the rounding denominator or we’ll get

>> unexpected rounding results. Boost::Rational provides no rounding

>> function of its own so I rewrote gnc_numeric_convert into C++ using

>> the overloaded operators from boost::multiprecision. That at least

>> taught me about rounding arbitrary denominators, so my head doesn’t

>> explode any more.

>>

>> The good news is that using 128-bit numbers for all internal

>> representations along with aggressive reduction and a tweak to

>> get_random_gnc_numeric() so that the actual number doesn’t exceed

>> 1E13/1 and careful attention to rounding prevents overflow errors

>> during testing, at least up through test-lots.

>>

>> Looking a bit more at rounding, it doesn’t appear to me that at 14 out

>> of 151 gnc_numeric operations in the code base we’re over-using

>> GNC_HOW_RND_NEVER. I’m not convinced that it would help much to

>> eliminate those cases.

>>

>> It looks like the best solution is to work over our existing

>> gnc-numeric with math128 implementation so that the internals are

>> always 128-bit and we don’t declare overflows prematurely.

>>

> Thanks for the update and the elaborate testing.

>

> So,... math128 is what we use now, using the rational representation of

> numbers, do I get that right ? And the best option is to stick with it

> and improve on it ? Would you still transform it into C++ so it becomes

> an object with properties and members ?

Yes to all.