[GNC-dev] Introduction, a story, and 50% improvement in XML loading speed

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[GNC-dev] Introduction, a story, and 50% improvement in XML loading speed

Chris Carson
TL;DR:  hi!  I'm a programmer!  The attached patch to one line of code
gives a 50% reduction in XML load CPU use!  Skeptical?  I was.

*Introduction* (of me)

My name is Chris Carson.  I wrote code for a living from 1976-1986, and
then entered management but continued dabbling in code as a hobbyist.  C,
C++, Unix, Linux, blah blah.

*A Story*

I have financial data in Quicken dating back to 1991.  When Intuit sold off
Quicken I decided I needed a path to another financial management package.
I wrote a processor to deal with the QIF file duplicate transfer problem
(that's another story) and imported my data into Gnucash.  The resulting
XML save file is 55.8Mb (uncompressed.  Yes, I store it compressed.)

The XML file takes ~38 seconds of user CPU time to load on my build of the
Gnucash 3.3 maint stream.  (For reference, starting Gnucash with an empty
simple account file takes ~4.5 seconds of user CPU time on my machine.) I
did a relatively tedious run of callgrind which showed that about half of
that time was being consumed dom_chars_handler(...) and
checked_char_cast(...), both in the libgnucash/backend/xml directory.

Turns out dom_chars_handler(...) is called with an enormous multi-line
string.  It copies the whole thing and validates it, nibbles off a few
bytes, and returns, only to be called again with the remainder of the
enormous multi-line string to copy, validate, nibble again.

*The Patch*

I tried a couple of different fixes to this.  The patch below copies off
and validates only the bytes being consumed.  It brings the user CPU to
startup and load my XML file from ~38 seconds to ~20.5 seconds, and given
that 4.5 seconds of that is startup I make that about a 50% improvement in
load speed.  I tried a more aggressive fix for funsies and it wasn't much
better.

I have tested this *ONLY* on the load of my largeish XML file.  But the
patched code reads well.

What would you guys advise as next steps?

Patch included below signature, and separately as a file.

Regards,
Chris Carson
=====================
From b4e1911f774bfc292e97cffd2492a0257d0aee3c Mon Sep 17 00:00:00 2001
From: "Christopher D. Carson" <[hidden email]>
Date: Sun, 23 Dec 2018 20:48:02 -0600
Subject: [PATCH] Performance fix in dom_chars_handler: use g_strndup instead
 of g_strdup

Because the origin string can be extraordinarily long, you get more
benefit from this than you would imagine
---
 libgnucash/backend/xml/sixtp-to-dom-parser.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgnucash/backend/xml/sixtp-to-dom-parser.cpp
b/libgnucash/backend/xml/sixtp-to-dom-parser.cpp
index e6ba43039..9aba0801a 100644
--- a/libgnucash/backend/xml/sixtp-to-dom-parser.cpp
+++ b/libgnucash/backend/xml/sixtp-to-dom-parser.cpp
@@ -95,7 +95,7 @@ static gboolean dom_chars_handler (
 {
     if (length > 0)
     {
-        gchar* newtext = g_strdup (text);
+        gchar* newtext = g_strndup (text,length);
         xmlNodeAddContentLen ((xmlNodePtr)parent_data,
                               checked_char_cast (newtext), length);
         g_free (newtext);
--
2.19.2

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel

0001-Performance-fix-in-dom_chars_handler-use-g_strndup-i.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Introduction, a story, and 50% improvement in XML loading speed

Geert Janssens-4
Op maandag 24 december 2018 13:38:27 CET schreef Chris Carson:

> *The Patch*
>
> I tried a couple of different fixes to this.  The patch below copies off
> and validates only the bytes being consumed.  It brings the user CPU to
> startup and load my XML file from ~38 seconds to ~20.5 seconds, and given
> that 4.5 seconds of that is startup I make that about a 50% improvement in
> load speed.  I tried a more aggressive fix for funsies and it wasn't much
> better.
>
> I have tested this *ONLY* on the load of my largeish XML file.  But the
> patched code reads well.
>
> What would you guys advise as next steps?

No you celebrate your first patch was included in the project's code base :)
A good find, congratulations!

I have committed your fix. It is consistent with the same we also do in
generic_accumulate_chars (sixtp-utils.cpp :80)

For future submissions it's probably easier to generate a pull request on
github. I'm looking forward to those :)

Regards,

Geert


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Introduction, a story, and 50% improvement in XML loading speed

Geert Janssens-4
Op maandag 24 december 2018 14:29:15 CET schreef Geert Janssens:

> Op maandag 24 december 2018 13:38:27 CET schreef Chris Carson:
> > *The Patch*
> >
> > I tried a couple of different fixes to this.  The patch below copies off
> > and validates only the bytes being consumed.  It brings the user CPU to
> > startup and load my XML file from ~38 seconds to ~20.5 seconds, and given
> > that 4.5 seconds of that is startup I make that about a 50% improvement in
> > load speed.  I tried a more aggressive fix for funsies and it wasn't much
> > better.
> >
> > I have tested this *ONLY* on the load of my largeish XML file.  But the
> > patched code reads well.
> >
> > What would you guys advise as next steps?
>
> No you celebrate your first patch was included in the project's code base :)
> A good find, congratulations!

That was meant to read "Now you celebrate..."
My keyboard needs replacement.

Geert


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

[GNC-dev] More need for speed improvement around XML (Re: Introduction, a story, and 50% improvement in XML loading speed)

Christian Stimming-4
In reply to this post by Chris Carson
Am Montag, 24. Dezember 2018, 13:38:27 CET schrieb Chris Carson:
> TL;DR:  hi!  I'm a programmer!  The attached patch to one line of code
> gives a 50% reduction in XML load CPU use!  Skeptical?  I was.

Dear Chris,

thanks a lot for this patch! I'm using gnucash since many years, too, and I'm
still somewhat unhappy with some of the reaction times in the 3.x series of
gnucash.

Now that you got a good catch on the XML loading time, maybe you are up for
some improvements on the XML saving time...? :-)  I ran some valgrind where I
could locate an expensive code section, but I'm a bit out of practice to know
the C++11 way of writing that code in a more efficient way. Here's what I
found:

When saving to XML file, for each transaction the call stack with the
expensive code walks down like this:

gnc_transaction_dom_tree_create
add_time64
time64_to_dom_tree
gnc_print_time64
GncDateTime::format
GncDateTimeImpl::format

and this very last function seems significantly more expensive than needed,
slowing down saving considerably:

std::string
GncDateTimeImpl::format(const char* format) const
{
    using Facet = boost::local_time::local_time_facet;
    std::stringstream ss;
    //The stream destructor frees the facet, so it must be heap-allocated.
    auto output_facet(new Facet(normalize_format(format).c_str()));
    // FIXME Rather than imbueing a locale below we probably should set
std::locale::global appropriately somewhere.
    ss.imbue(std::locale(std::locale(""), output_facet));
    ss << m_time;
    return ss.str();
}

Both the operator<< and the constructor std::locale(const char*) seem to be
quite costly operations. To me, it seems there is way too much construction/
destruction going on for options that are constant during each saving
operation. Maybe even the recent commit 7f1a7115 (Frank?) has slowed down
things here even more. Maybe this could be checked in some of the unittests
more easily, but I'm a bit out of practice there, too. Maybe someone is up for
some performance boosting here...? Thanks a lot in advance!

Best Regards,

Christian



_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] More need for speed improvement around XML (Re: Introduction, a story, and 50% improvement in XML loading speed)

John Ralls-2


> On Dec 28, 2018, at 2:14 PM, Christian Stimming <[hidden email]> wrote:
>
> Am Montag, 24. Dezember 2018, 13:38:27 CET schrieb Chris Carson:
>> TL;DR:  hi!  I'm a programmer!  The attached patch to one line of code
>> gives a 50% reduction in XML load CPU use!  Skeptical?  I was.
>
> Dear Chris,
>
> thanks a lot for this patch! I'm using gnucash since many years, too, and I'm
> still somewhat unhappy with some of the reaction times in the 3.x series of
> gnucash.
>
> Now that you got a good catch on the XML loading time, maybe you are up for
> some improvements on the XML saving time...? :-)  I ran some valgrind where I
> could locate an expensive code section, but I'm a bit out of practice to know
> the C++11 way of writing that code in a more efficient way. Here's what I
> found:
>
> When saving to XML file, for each transaction the call stack with the
> expensive code walks down like this:
>
> gnc_transaction_dom_tree_create
> add_time64
> time64_to_dom_tree
> gnc_print_time64
> GncDateTime::format
> GncDateTimeImpl::format
>
> and this very last function seems significantly more expensive than needed,
> slowing down saving considerably:
>
> std::string
> GncDateTimeImpl::format(const char* format) const
> {
>    using Facet = boost::local_time::local_time_facet;
>    std::stringstream ss;
>    //The stream destructor frees the facet, so it must be heap-allocated.
>    auto output_facet(new Facet(normalize_format(format).c_str()));
>    // FIXME Rather than imbueing a locale below we probably should set
> std::locale::global appropriately somewhere.
>    ss.imbue(std::locale(std::locale(""), output_facet));
>    ss << m_time;
>    return ss.str();
> }
>
> Both the operator<< and the constructor std::locale(const char*) seem to be
> quite costly operations. To me, it seems there is way too much construction/
> destruction going on for options that are constant during each saving
> operation. Maybe even the recent commit 7f1a7115 (Frank?) has slowed down
> things here even more. Maybe this could be checked in some of the unittests
> more easily, but I'm a bit out of practice there, too. Maybe someone is up for
> some performance boosting here...? Thanks a lot in advance!

Christian,

I don't have time to fully test it out right now, but give https://github.com/jralls/gnucash/tree/maint a go.

As noted in the HEAD commit it has a side effect of recording times in XML files in UTC, no timezones. Users who want to keep their files in version control will like that, but it needs to be checked for backward compatibility with 2.6 before it can be merged into maint.

Regards,
John Ralls
_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] More need for speed improvement around XML

Christian Stimming-4
Am Samstag, 29. Dezember 2018, 01:45:32 CET schrieb John Ralls:

> > When saving to XML file, for each transaction the call stack with the
> > expensive code walks down like this:
> >
> > gnc_transaction_dom_tree_create
> > add_time64
> > time64_to_dom_tree
> > gnc_print_time64
> > GncDateTime::format
> > GncDateTimeImpl::format
>
> Christian,
>
> I don't have time to fully test it out right now, but give
> https://github.com/jralls/gnucash/tree/maint a go.
>
> As noted in the HEAD commit it has a side effect of recording times in XML
> files in UTC, no timezones. Users who want to keep their files in version
> control will like that, but it needs to be checked for backward
> compatibility with 2.6 before it can be merged into maint.

Thanks for working at this. Indeed the main issue is that we don't need a
user-visible format conversion (which must respect the locale) but instead a
serialization to a fixed iso8601 format. Fixing this issue also means that the
XML files will no longer depend on the local timezone, which is an issue that
has been asked for a long time.

Thanks for your patch - this fixes performance issues with
gnc_time64_to_iso8601_buff() and all of its callers. However, the Save-to-XML
function calls gnc_print_time64() instead (from time64_to_dom_tree), which is
a different code path. Consequently, your patch alone does not modify the
saved XML (just verified). Your call to GncDateTime::format_iso8601 needs to
be used in gnc_print_time64, too, like so:

 xmlNodePtr
 time64_to_dom_tree (const char* tag, const time64 time)
 {
     xmlNodePtr ret;
     g_return_val_if_fail (time != INT64_MAX, NULL);
-    auto date_str = gnc_print_time64 (time, "%Y-%m-%d %H:%M:%S %q");
-    if (!date_str)
-        return NULL;
+    GncDateTime gncdt(time);
+    auto sstr = gncdt.format_iso8601();
+    gchar* date_str = g_strdup(sstr.c_str());


However, with this patch, all time64 timestamps in the XML file will then
change from time+timezone to UTC time only. To my surprise, although we knew
about this issue for such a long time and changed most timestamps
appropriately, there are still timestamps that will change date due to this
change. With a quick glance, the "reconciled-date" timestamp contains
beginning-of-day in a local timezone. Also maybe user-entered commodity
prices. Others might still show up, oh well.

I haven't checked for backward compatibility with 2.6, but will do during the
next days.

Regards,

Christian


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] More need for speed improvement around XML

John Ralls-2


> On Dec 29, 2018, at 2:16 PM, Christian Stimming <[hidden email]> wrote:
>
> Am Samstag, 29. Dezember 2018, 01:45:32 CET schrieb John Ralls:
>>> When saving to XML file, for each transaction the call stack with the
>>> expensive code walks down like this:
>>>
>>> gnc_transaction_dom_tree_create
>>> add_time64
>>> time64_to_dom_tree
>>> gnc_print_time64
>>> GncDateTime::format
>>> GncDateTimeImpl::format
>>
>> Christian,
>>
>> I don't have time to fully test it out right now, but give
>> https://github.com/jralls/gnucash/tree/maint a go.
>>
>> As noted in the HEAD commit it has a side effect of recording times in XML
>> files in UTC, no timezones. Users who want to keep their files in version
>> control will like that, but it needs to be checked for backward
>> compatibility with 2.6 before it can be merged into maint.
>
> Thanks for working at this. Indeed the main issue is that we don't need a
> user-visible format conversion (which must respect the locale) but instead a
> serialization to a fixed iso8601 format. Fixing this issue also means that the
> XML files will no longer depend on the local timezone, which is an issue that
> has been asked for a long time.
>
> Thanks for your patch - this fixes performance issues with
> gnc_time64_to_iso8601_buff() and all of its callers. However, the Save-to-XML
> function calls gnc_print_time64() instead (from time64_to_dom_tree), which is
> a different code path. Consequently, your patch alone does not modify the
> saved XML (just verified). Your call to GncDateTime::format_iso8601 needs to
> be used in gnc_print_time64, too, like so:
>
> xmlNodePtr
> time64_to_dom_tree (const char* tag, const time64 time)
> {
>     xmlNodePtr ret;
>     g_return_val_if_fail (time != INT64_MAX, NULL);
> -    auto date_str = gnc_print_time64 (time, "%Y-%m-%d %H:%M:%S %q");
> -    if (!date_str)
> -        return NULL;
> +    GncDateTime gncdt(time);
> +    auto sstr = gncdt.format_iso8601();
> +    gchar* date_str = g_strdup(sstr.c_str());
>
>
> However, with this patch, all time64 timestamps in the XML file will then
> change from time+timezone to UTC time only. To my surprise, although we knew
> about this issue for such a long time and changed most timestamps
> appropriately, there are still timestamps that will change date due to this
> change. With a quick glance, the "reconciled-date" timestamp contains
> beginning-of-day in a local timezone. Also maybe user-entered commodity
> prices. Others might still show up, oh well.
>
> I haven't checked for backward compatibility with 2.6, but will do during the
> next days.

Rats, I got the two functions scrambled.

Geert and I discussed a month or two ago expanding the use of time-neutral to everywhere that we currently use day-begin and day-end except searches, but neither of us has gotten to it yet.

Regards,
John Ralls

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] More need for speed improvement around XML

Christian Stimming-4
Am Sonntag, 30. Dezember 2018, 00:44:40 CET schrieb John Ralls:

> > On Dec 29, 2018, at 2:16 PM, Christian Stimming <[hidden email]>
> > wrote:>
> > Am Samstag, 29. Dezember 2018, 01:45:32 CET schrieb John Ralls:
> >>> When saving to XML file, for each transaction the call stack with the
> >>> expensive code walks down like this:
> >>>
> >>> gnc_transaction_dom_tree_create
> >>> add_time64
> >>> time64_to_dom_tree
> >>> gnc_print_time64
> >>> GncDateTime::format
> >>> GncDateTimeImpl::format
> >>
> >> Christian,
> >>
> >> I don't have time to fully test it out right now, but give
> >> https://github.com/jralls/gnucash/tree/maint a go.
> >>
> >> As noted in the HEAD commit it has a side effect of recording times in
> >> XML
> >> files in UTC, no timezones. Users who want to keep their files in version
> >> control will like that, but it needs to be checked for backward
> >> compatibility with 2.6 before it can be merged into maint.
> >
> > Thanks for working at this. Indeed the main issue is that we don't need a
> > user-visible format conversion (which must respect the locale) but instead
> > a serialization to a fixed iso8601 format. Fixing this issue also means
> > that the XML files will no longer depend on the local timezone, which is
> > an issue that has been asked for a long time.
> >
> > Thanks for your patch - this fixes performance issues with
> > gnc_time64_to_iso8601_buff() and all of its callers. However, the
> > Save-to-XML function calls gnc_print_time64() instead (from
> > time64_to_dom_tree), which is a different code path. Consequently, your
> > patch alone does not modify the saved XML (just verified). Your call to
> > GncDateTime::format_iso8601 needs to be used in gnc_print_time64, too,
> > like so:
> >
> > xmlNodePtr
> > time64_to_dom_tree (const char* tag, const time64 time)
> > {
> >
> >     xmlNodePtr ret;
> >     g_return_val_if_fail (time != INT64_MAX, NULL);
> >
> > -    auto date_str = gnc_print_time64 (time, "%Y-%m-%d %H:%M:%S %q");
> > -    if (!date_str)
> > -        return NULL;
> > +    GncDateTime gncdt(time);
> > +    auto sstr = gncdt.format_iso8601();
> > +    gchar* date_str = g_strdup(sstr.c_str());
> >
> >
> > However, with this patch, all time64 timestamps in the XML file will then
> > change from time+timezone to UTC time only. To my surprise, although we
> > knew about this issue for such a long time and changed most timestamps
> > appropriately, there are still timestamps that will change date due to
> > this
> > change. With a quick glance, the "reconciled-date" timestamp contains
> > beginning-of-day in a local timezone. Also maybe user-entered commodity
> > prices. Others might still show up, oh well.
> >
> > I haven't checked for backward compatibility with 2.6, but will do during
> > the next days.
>
> Rats, I got the two functions scrambled.
>
> Geert and I discussed a month or two ago expanding the use of time-neutral
> to everywhere that we currently use day-begin and day-end except searches,
> but neither of us has gotten to it yet.

For completeness: I checked loading such a file with gnucash-2.6, and as far
as I can tell, this runs just fine. All transactions and their dates show up
as usual even though the timestamps are now read as UTC instead of localtime.

Regards,
Christian


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] More need for speed improvement around XML

John Ralls-2


> On Dec 30, 2018, at 1:25 PM, Christian Stimming <[hidden email]> wrote:
>
> Am Sonntag, 30. Dezember 2018, 00:44:40 CET schrieb John Ralls:
>>> On Dec 29, 2018, at 2:16 PM, Christian Stimming <[hidden email]>
>>> wrote:>
>>> Am Samstag, 29. Dezember 2018, 01:45:32 CET schrieb John Ralls:
>>>>> When saving to XML file, for each transaction the call stack with the
>>>>> expensive code walks down like this:
>>>>>
>>>>> gnc_transaction_dom_tree_create
>>>>> add_time64
>>>>> time64_to_dom_tree
>>>>> gnc_print_time64
>>>>> GncDateTime::format
>>>>> GncDateTimeImpl::format
>>>>
>>>> Christian,
>>>>
>>>> I don't have time to fully test it out right now, but give
>>>> https://github.com/jralls/gnucash/tree/maint a go.
>>>>
>>>> As noted in the HEAD commit it has a side effect of recording times in
>>>> XML
>>>> files in UTC, no timezones. Users who want to keep their files in version
>>>> control will like that, but it needs to be checked for backward
>>>> compatibility with 2.6 before it can be merged into maint.
>>>
>>> Thanks for working at this. Indeed the main issue is that we don't need a
>>> user-visible format conversion (which must respect the locale) but instead
>>> a serialization to a fixed iso8601 format. Fixing this issue also means
>>> that the XML files will no longer depend on the local timezone, which is
>>> an issue that has been asked for a long time.
>>>
>>> Thanks for your patch - this fixes performance issues with
>>> gnc_time64_to_iso8601_buff() and all of its callers. However, the
>>> Save-to-XML function calls gnc_print_time64() instead (from
>>> time64_to_dom_tree), which is a different code path. Consequently, your
>>> patch alone does not modify the saved XML (just verified). Your call to
>>> GncDateTime::format_iso8601 needs to be used in gnc_print_time64, too,
>>> like so:
>>>
>>> xmlNodePtr
>>> time64_to_dom_tree (const char* tag, const time64 time)
>>> {
>>>
>>>    xmlNodePtr ret;
>>>    g_return_val_if_fail (time != INT64_MAX, NULL);
>>>
>>> -    auto date_str = gnc_print_time64 (time, "%Y-%m-%d %H:%M:%S %q");
>>> -    if (!date_str)
>>> -        return NULL;
>>> +    GncDateTime gncdt(time);
>>> +    auto sstr = gncdt.format_iso8601();
>>> +    gchar* date_str = g_strdup(sstr.c_str());
>>>
>>>
>>> However, with this patch, all time64 timestamps in the XML file will then
>>> change from time+timezone to UTC time only. To my surprise, although we
>>> knew about this issue for such a long time and changed most timestamps
>>> appropriately, there are still timestamps that will change date due to
>>> this
>>> change. With a quick glance, the "reconciled-date" timestamp contains
>>> beginning-of-day in a local timezone. Also maybe user-entered commodity
>>> prices. Others might still show up, oh well.
>>>
>>> I haven't checked for backward compatibility with 2.6, but will do during
>>> the next days.
>>
>> Rats, I got the two functions scrambled.
>>
>> Geert and I discussed a month or two ago expanding the use of time-neutral
>> to everywhere that we currently use day-begin and day-end except searches,
>> but neither of us has gotten to it yet.
>
> For completeness: I checked loading such a file with gnucash-2.6, and as far
> as I can tell, this runs just fine. All transactions and their dates show up
> as usual even though the timestamps are now read as UTC instead of localtime.

The timestamps have always been UTC, they've just been written oddly. It's explained in comments somewhere, I'll summarize: We've historically written timestamps as an ISO8607-like string, i.e. YYYY-MM-DD HH:MM:SS with an offset representing a timezone, ±HHMM. For example my TZ offset is -0800, so the date-time right now is 2018-12-30 14:26:24 -0800. The parser adds back the 8 hours and stores whatever 2018-12-30 22:26:24 is in seconds since the epoch. Losing the offset doesn't change anything except to make less work for
the parser and reducing the diff of two files written with different offsets.

It's good news that the 2.6 parser reads them correctly.

Since SQL doesn't understand offsets we've always used plain UTC date times in the SQL backend.

Regards,
John Ralls

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] More need for speed improvement around XML

John Ralls-2


> On Dec 30, 2018, at 4:52 PM, John Ralls <[hidden email]> wrote:
>
>
>
>> On Dec 30, 2018, at 1:25 PM, Christian Stimming <[hidden email]> wrote:
>>
>> Am Sonntag, 30. Dezember 2018, 00:44:40 CET schrieb John Ralls:
>>>> On Dec 29, 2018, at 2:16 PM, Christian Stimming <[hidden email]>
>>>> wrote:>
>>>> Am Samstag, 29. Dezember 2018, 01:45:32 CET schrieb John Ralls:
>>>>>> When saving to XML file, for each transaction the call stack with the
>>>>>> expensive code walks down like this:
>>>>>>
>>>>>> gnc_transaction_dom_tree_create
>>>>>> add_time64
>>>>>> time64_to_dom_tree
>>>>>> gnc_print_time64
>>>>>> GncDateTime::format
>>>>>> GncDateTimeImpl::format
>>>>>
>>>>> Christian,
>>>>>
>>>>> I don't have time to fully test it out right now, but give
>>>>> https://github.com/jralls/gnucash/tree/maint a go.
>>>>>
>>>>> As noted in the HEAD commit it has a side effect of recording times in
>>>>> XML
>>>>> files in UTC, no timezones. Users who want to keep their files in version
>>>>> control will like that, but it needs to be checked for backward
>>>>> compatibility with 2.6 before it can be merged into maint.
>>>>
>>>> Thanks for working at this. Indeed the main issue is that we don't need a
>>>> user-visible format conversion (which must respect the locale) but instead
>>>> a serialization to a fixed iso8601 format. Fixing this issue also means
>>>> that the XML files will no longer depend on the local timezone, which is
>>>> an issue that has been asked for a long time.
>>>>
>>>> Thanks for your patch - this fixes performance issues with
>>>> gnc_time64_to_iso8601_buff() and all of its callers. However, the
>>>> Save-to-XML function calls gnc_print_time64() instead (from
>>>> time64_to_dom_tree), which is a different code path. Consequently, your
>>>> patch alone does not modify the saved XML (just verified). Your call to
>>>> GncDateTime::format_iso8601 needs to be used in gnc_print_time64, too,
>>>> like so:
>>>>
>>>> xmlNodePtr
>>>> time64_to_dom_tree (const char* tag, const time64 time)
>>>> {
>>>>
>>>>   xmlNodePtr ret;
>>>>   g_return_val_if_fail (time != INT64_MAX, NULL);
>>>>
>>>> -    auto date_str = gnc_print_time64 (time, "%Y-%m-%d %H:%M:%S %q");
>>>> -    if (!date_str)
>>>> -        return NULL;
>>>> +    GncDateTime gncdt(time);
>>>> +    auto sstr = gncdt.format_iso8601();
>>>> +    gchar* date_str = g_strdup(sstr.c_str());
>>>>
>>>>
>>>> However, with this patch, all time64 timestamps in the XML file will then
>>>> change from time+timezone to UTC time only. To my surprise, although we
>>>> knew about this issue for such a long time and changed most timestamps
>>>> appropriately, there are still timestamps that will change date due to
>>>> this
>>>> change. With a quick glance, the "reconciled-date" timestamp contains
>>>> beginning-of-day in a local timezone. Also maybe user-entered commodity
>>>> prices. Others might still show up, oh well.
>>>>
>>>> I haven't checked for backward compatibility with 2.6, but will do during
>>>> the next days.
>>>
>>> Rats, I got the two functions scrambled.
>>>
>>> Geert and I discussed a month or two ago expanding the use of time-neutral
>>> to everywhere that we currently use day-begin and day-end except searches,
>>> but neither of us has gotten to it yet.
>>
>> For completeness: I checked loading such a file with gnucash-2.6, and as far
>> as I can tell, this runs just fine. All transactions and their dates show up
>> as usual even though the timestamps are now read as UTC instead of localtime.
>
> The timestamps have always been UTC, they've just been written oddly. It's explained in comments somewhere, I'll summarize: We've historically written timestamps as an ISO8607-like string, i.e. YYYY-MM-DD HH:MM:SS with an offset representing a timezone, ±HHMM. For example my TZ offset is -0800, so the date-time right now is 2018-12-30 14:26:24 -0800. The parser adds back the 8 hours and stores whatever 2018-12-30 22:26:24 is in seconds since the epoch. Losing the offset doesn't change anything except to make less work for
> the parser and reducing the diff of two files written with different offsets.
>
> It's good news that the 2.6 parser reads them correctly.
>
> Since SQL doesn't understand offsets we've always used plain UTC date times in the SQL backend.

I've merged Chris Carson's PRs to speed up the scrub and to get the C++ locale only once and I've fixed up my format_iso8601 so that it's used in both backends instead of GncDateTime::format and pushed that as well.

Regards,
John Ralls

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

[GNC-dev] Static locale commit unfortunately buggy

Christian Stimming-4
In reply to this post by John Ralls-2
Dear John,

unfortunately your commit b4fedff9 of last weekend breaks the correct locale
treatment. I didn't investigate this in detail, but the date status bar
message changed to english with this commit instead of sticking to my local
language. This is with LANG=de_DE.UTF-8
Could you have a look into this? Thanks a lot!

Regards,
Christian



_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Static locale commit unfortunately buggy

John Ralls-2


> On Jan 12, 2019, at 3:31 AM, Christian Stimming <[hidden email]> wrote:
>
> Dear John,
>
> unfortunately your commit b4fedff9 of last weekend breaks the correct locale
> treatment. I didn't investigate this in detail, but the date status bar
> message changed to english with this commit instead of sticking to my local
> language. This is with LANG=de_DE.UTF-8
> Could you have a look into this? Thanks a lot!

Christian,

Thanks for noticing. It seems that the std::locale objects created by boost::locale aren't quite compatible. I've resolved the problem, at least for maint, by using boost::locale to format date-times instead of boost::date_time.

For the future we can consider another reimplementation of GncDateTime. boost::date_time is an oldish library and hasn't been improved much since it was accepted into boost. It's a bit quirky and doesn't perform very well. std::chrono in C++20 has calendar and timezone functions, so that's one possible replacement, though not until at least llvm and gcc implement the expanded std::chrono and those implementations are widely available. There's also ICU as wrapped by boost::locale. I demurred from that when I picked boost::date_time because it's huge, but if we're pulling it in anyway it's worth reconsidering.

Regards,
John Ralls

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Static locale commit unfortunately buggy

Christian Stimming-4
Am Samstag, 19. Januar 2019, 23:21:58 CET schrieb John Ralls:
> Christian,
>
> Thanks for noticing. It seems that the std::locale objects created by
> boost::locale aren't quite compatible. I've resolved the problem, at least
> for maint, by using boost::locale to format date-times instead of
> boost::date_time.

Dear John,

thanks for working on this. Now it is fine again and all fixed. Thank you very
much.

Regards,
Christian

>
> For the future we can consider another reimplementation of GncDateTime.
> boost::date_time is an oldish library and hasn't been improved much since
> it was accepted into boost. It's a bit quirky and doesn't perform very
> well. std::chrono in C++20 has calendar and timezone functions, so that's
> one possible replacement, though not until at least llvm and gcc implement
> the expanded std::chrono and those implementations are widely available.
> There's also ICU as wrapped by boost::locale. I demurred from that when I
> picked boost::date_time because it's huge, but if we're pulling it in
> anyway it's worth reconsidering.
>
> Regards,
> John Ralls




_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel