[GNC-dev] Bug 797463 - CSV Import of transactions into a new file hangs

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[GNC-dev] Bug 797463 - CSV Import of transactions into a new file hangs

Christian Gruber
I have some questions related to Bug 797463
<https://bugs.gnucash.org/show_bug.cgi?id=797463>, which I have analyzed.

The author wrote, that "Gnu Cash hangs", when importing (only two)
transactions into a Gnu Cash file with standard accounts list SKR04.

My analysis showed, that actually Gnu Cash does not hang, but needs
really long time for import (several minutes). The problem is, that the
author has bayesian matching enabled in his preferences, but the Gnu
Cash file is not prepared for bayesian matching (feature
GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the Gnu Cash file).
Moreover the Gnu Cash file contains a lot of accounts (approx. 1000).

Most time is spent in function check_import_map_data(), which is called
from gnc_account_imap_find_account_bayes() (Account.cpp). In this
function imap_convert_bayes_to_flat() is called, which AFAICS prepares
all accounts for bayesian matching.

I'm not familiar with that conversion step and therefore have several
questions:

Why does the conversion need so much CPU time?

If the conversion needs so much CPU time, why is it done only
temporarily? The conversion is done for each of the two transactions again.

Why is the conversion even not persistent after the import is done? The
feature GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the Gnu Cash file,
even not after the import is finished.

Regards,
Christian

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Bug 797463 - CSV Import of transactions into a new file hangs

Christian Gruber
Can anybody provide help?

The last change on the relevant functions was done in commit fbf4843f31
by "lmat" in Dec 2017 between GnuCash versions 2.6 and 2.7. And the
commit message seems to fit.

Christian

Am 04.11.19 um 20:28 schrieb Christian Gruber:

> I have some questions related to Bug 797463
> <https://bugs.gnucash.org/show_bug.cgi?id=797463>, which I have analyzed.
>
> The author wrote, that "Gnu Cash hangs", when importing (only two)
> transactions into a Gnu Cash file with standard accounts list SKR04.
>
> My analysis showed, that actually Gnu Cash does not hang, but needs
> really long time for import (several minutes). The problem is, that
> the author has bayesian matching enabled in his preferences, but the
> Gnu Cash file is not prepared for bayesian matching (feature
> GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the Gnu Cash file).
> Moreover the Gnu Cash file contains a lot of accounts (approx. 1000).
>
> Most time is spent in function check_import_map_data(), which is
> called from gnc_account_imap_find_account_bayes() (Account.cpp). In
> this function imap_convert_bayes_to_flat() is called, which AFAICS
> prepares all accounts for bayesian matching.
>
> I'm not familiar with that conversion step and therefore have several
> questions:
>
> Why does the conversion need so much CPU time?
>
> If the conversion needs so much CPU time, why is it done only
> temporarily? The conversion is done for each of the two transactions
> again.
>
> Why is the conversion even not persistent after the import is done?
> The feature GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the Gnu Cash
> file, even not after the import is finished.
>
> Regards,
> Christian
>
_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Bug 797463 - CSV Import of transactions into a new file hangs

John Ralls-2
Christian,

It's not that it's not prepared for Bayesian matching, it's that older versions of GnuCash stored the Bayesian match tokens hierarchically. Aaron Laws (lmat) changed it to a flatter structure with somewhat better memory locality for faster access. imap_convert_bayes_to_flat should run once to convert the data and set the feature, after which check_import_map_data will see the flag and return. A file created with 3.x and Baysian maps would already have the feature set.


With that background, to your questions:

Why does it take so long? Because it traverses the entire tree of accounts, every time. The test book has 1127 accounts. Add to that that there are some things inside the loop that shouldn't be and that convert_imap_account_bayes_to_flat doesn't use some obvious short circuits and you get taking a long time.

Why does it run twice? Because there aren't any accounts with import-map-bayes slots, so it does no conversions so it doesn't set the feature.

Why is the feature not set after the import? It should be if it's actually setting any matches. That's done at the end of change_imap_entry. If you're quitting the matcher without associating the transactions to accounts it won't call change_imap_entry and set the feature.

Regards,
John Ralls


> On Nov 7, 2019, at 2:41 PM, Christian Gruber <[hidden email]> wrote:
>
> Can anybody provide help?
>
> The last change on the relevant functions was done in commit fbf4843f31 by "lmat" in Dec 2017 between GnuCash versions 2.6 and 2.7. And the commit message seems to fit.
>
> Christian
>
> Am 04.11.19 um 20:28 schrieb Christian Gruber:
>> I have some questions related to Bug 797463 <https://bugs.gnucash.org/show_bug.cgi?id=797463>, which I have analyzed.
>>
>> The author wrote, that "Gnu Cash hangs", when importing (only two) transactions into a Gnu Cash file with standard accounts list SKR04.
>>
>> My analysis showed, that actually Gnu Cash does not hang, but needs really long time for import (several minutes). The problem is, that the author has bayesian matching enabled in his preferences, but the Gnu Cash file is not prepared for bayesian matching (feature GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the Gnu Cash file). Moreover the Gnu Cash file contains a lot of accounts (approx. 1000).
>>
>> Most time is spent in function check_import_map_data(), which is called from gnc_account_imap_find_account_bayes() (Account.cpp). In this function imap_convert_bayes_to_flat() is called, which AFAICS prepares all accounts for bayesian matching.
>>
>> I'm not familiar with that conversion step and therefore have several questions:
>>
>> Why does the conversion need so much CPU time?
>>
>> If the conversion needs so much CPU time, why is it done only temporarily? The conversion is done for each of the two transactions again.
>>
>> Why is the conversion even not persistent after the import is done? The feature GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the Gnu Cash file, even not after the import is finished.
>>
>> Regards,
>> Christian
>>
> _______________________________________________
> gnucash-devel mailing list
> [hidden email]
> https://lists.gnucash.org/mailman/listinfo/gnucash-devel

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Bug 797463 - CSV Import of transactions into a new file hangs

Christian Gruber
Am 08.11.19 um 04:39 schrieb John Ralls:

> Christian,
>
> It's not that it's not prepared for Bayesian matching, it's that older versions of GnuCash stored the Bayesian match tokens hierarchically. Aaron Laws (lmat) changed it to a flatter structure with somewhat better memory locality for faster access. imap_convert_bayes_to_flat should run once to convert the data and set the feature, after which check_import_map_data will see the flag and return. A file created with 3.x and Baysian maps would already have the feature set.
>
>
> With that background, to your questions:
>
> Why does it take so long? Because it traverses the entire tree of accounts, every time. The test book has 1127 accounts. Add to that that there are some things inside the loop that shouldn't be and that convert_imap_account_bayes_to_flat doesn't use some obvious short circuits and you get taking a long time.
>
> Why does it run twice? Because there aren't any accounts with import-map-bayes slots, so it does no conversions so it doesn't set the feature.

Why isn't the feature set in any case after conversion is done, whether
there was any slot to convert or not?

I would expect that, if the hierarchical structure should not be used
anymore. I wouldn't only set the feature, if there has been any error
during conversion. But it's not an error, if
convert_imap_account_bayes_to_flat() returns false.

>
> Why is the feature not set after the import? It should be if it's actually setting any matches. That's done at the end of change_imap_entry. If you're quitting the matcher without associating the transactions to accounts it won't call change_imap_entry and set the feature.
>
> Regards,
> John Ralls
>
>
>> On Nov 7, 2019, at 2:41 PM, Christian Gruber <[hidden email]> wrote:
>>
>> Can anybody provide help?
>>
>> The last change on the relevant functions was done in commit fbf4843f31 by "lmat" in Dec 2017 between GnuCash versions 2.6 and 2.7. And the commit message seems to fit.
>>
>> Christian
>>
>> Am 04.11.19 um 20:28 schrieb Christian Gruber:
>>> I have some questions related to Bug 797463 <https://bugs.gnucash.org/show_bug.cgi?id=797463>, which I have analyzed.
>>>
>>> The author wrote, that "Gnu Cash hangs", when importing (only two) transactions into a Gnu Cash file with standard accounts list SKR04.
>>>
>>> My analysis showed, that actually Gnu Cash does not hang, but needs really long time for import (several minutes). The problem is, that the author has bayesian matching enabled in his preferences, but the Gnu Cash file is not prepared for bayesian matching (feature GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the Gnu Cash file). Moreover the Gnu Cash file contains a lot of accounts (approx. 1000).
>>>
>>> Most time is spent in function check_import_map_data(), which is called from gnc_account_imap_find_account_bayes() (Account.cpp). In this function imap_convert_bayes_to_flat() is called, which AFAICS prepares all accounts for bayesian matching.
>>>
>>> I'm not familiar with that conversion step and therefore have several questions:
>>>
>>> Why does the conversion need so much CPU time?
>>>
>>> If the conversion needs so much CPU time, why is it done only temporarily? The conversion is done for each of the two transactions again.
>>>
>>> Why is the conversion even not persistent after the import is done? The feature GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the Gnu Cash file, even not after the import is finished.
>>>
>>> Regards,
>>> Christian
>>>
>> _______________________________________________
>> gnucash-devel mailing list
>> [hidden email]
>> https://lists.gnucash.org/mailman/listinfo/gnucash-devel
_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Bug 797463 - CSV Import of transactions into a new file hangs

John Ralls-2


> On Nov 8, 2019, at 1:58 PM, Christian Gruber <[hidden email]> wrote:
>
> Am 08.11.19 um 04:39 schrieb John Ralls:
>> Christian,
>>
>> It's not that it's not prepared for Bayesian matching, it's that older versions of GnuCash stored the Bayesian match tokens hierarchically. Aaron Laws (lmat) changed it to a flatter structure with somewhat better memory locality for faster access. imap_convert_bayes_to_flat should run once to convert the data and set the feature, after which check_import_map_data will see the flag and return. A file created with 3.x and Baysian maps would already have the feature set.
>>
>>
>> With that background, to your questions:
>>
>> Why does it take so long? Because it traverses the entire tree of accounts, every time. The test book has 1127 accounts. Add to that that there are some things inside the loop that shouldn't be and that convert_imap_account_bayes_to_flat doesn't use some obvious short circuits and you get taking a long time.
>>
>> Why does it run twice? Because there aren't any accounts with import-map-bayes slots, so it does no conversions so it doesn't set the feature.
>
> Why isn't the feature set in any case after conversion is done, whether there was any slot to convert or not?
>
> I would expect that, if the hierarchical structure should not be used anymore. I wouldn't only set the feature, if there has been any error during conversion. But it's not an error, if convert_imap_account_bayes_to_flat() returns false.

The feature isn't set because that would prevent using the file with an older version of GnuCash for no good reason: There aren't any import-map-bayes tags in the new format so there's nothing for the older version to mess up.

Regards,
John Ralls

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Bug 797463 - CSV Import of transactions into a new file hangs

Christian Gruber

Am 08.11.19 um 23:08 schrieb John Ralls:

>
>> On Nov 8, 2019, at 1:58 PM, Christian Gruber <[hidden email]> wrote:
>>
>> Am 08.11.19 um 04:39 schrieb John Ralls:
>>> Christian,
>>>
>>> It's not that it's not prepared for Bayesian matching, it's that older versions of GnuCash stored the Bayesian match tokens hierarchically. Aaron Laws (lmat) changed it to a flatter structure with somewhat better memory locality for faster access. imap_convert_bayes_to_flat should run once to convert the data and set the feature, after which check_import_map_data will see the flag and return. A file created with 3.x and Baysian maps would already have the feature set.
>>>
>>>
>>> With that background, to your questions:
>>>
>>> Why does it take so long? Because it traverses the entire tree of accounts, every time. The test book has 1127 accounts. Add to that that there are some things inside the loop that shouldn't be and that convert_imap_account_bayes_to_flat doesn't use some obvious short circuits and you get taking a long time.
>>>
>>> Why does it run twice? Because there aren't any accounts with import-map-bayes slots, so it does no conversions so it doesn't set the feature.
>> Why isn't the feature set in any case after conversion is done, whether there was any slot to convert or not?
>>
>> I would expect that, if the hierarchical structure should not be used anymore. I wouldn't only set the feature, if there has been any error during conversion. But it's not an error, if convert_imap_account_bayes_to_flat() returns false.
> The feature isn't set because that would prevent using the file with an older version of GnuCash for no good reason: There aren't any import-map-bayes tags in the new format so there's nothing for the older version to mess up.
>
> Regards,
> John Ralls
>
Sounds meaningful, but it's not obvious. The conversion process is not
much transparent to the user. The user is not informed about conversion
and additionally the conditions, when the GnuCash file is converted and
when not, are not quite intuitive, I guess. And the second question for
me is, if this is really a relevant use case to keep the GnuCash file
unconverted if possible, when it is used with a newer GnuCash version.
Maybe it's relevant, if several people, who use different GnuCash
versions work on the same GnuCash file. But what happens, when one with
a newer GnuCash version uses the bayesian import matcher for the first
time, which silently converts the GnuCash file? Will it still be usable
with an older GnuCash version?

But actually this is another discussion, which is not important for the
bug ticket. What's more important, I created a new fresh GnuCash file
with the SKR04 accounts template using my GnuCash version 3.7 and if I
checked this correctly, even with this GnuCash version the feature
GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the GnuCash file. Can you
confirm this? If yes, then the bug report is relevant not only for
migration from older to current GnuCash versions but even for current
GnuCash versions in general.

Regards,
Christian


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Bug 797463 - CSV Import of transactions into a new file hangs

John Ralls-2


> On Nov 10, 2019, at 12:07 PM, Christian Gruber <[hidden email]> wrote:
>
>
> Am 08.11.19 um 23:08 schrieb John Ralls:
>>
>>> On Nov 8, 2019, at 1:58 PM, Christian Gruber <[hidden email]> wrote:
>>>
>>> Am 08.11.19 um 04:39 schrieb John Ralls:
>>>> Christian,
>>>>
>>>> It's not that it's not prepared for Bayesian matching, it's that older versions of GnuCash stored the Bayesian match tokens hierarchically. Aaron Laws (lmat) changed it to a flatter structure with somewhat better memory locality for faster access. imap_convert_bayes_to_flat should run once to convert the data and set the feature, after which check_import_map_data will see the flag and return. A file created with 3.x and Baysian maps would already have the feature set.
>>>>
>>>>
>>>> With that background, to your questions:
>>>>
>>>> Why does it take so long? Because it traverses the entire tree of accounts, every time. The test book has 1127 accounts. Add to that that there are some things inside the loop that shouldn't be and that convert_imap_account_bayes_to_flat doesn't use some obvious short circuits and you get taking a long time.
>>>>
>>>> Why does it run twice? Because there aren't any accounts with import-map-bayes slots, so it does no conversions so it doesn't set the feature.
>>> Why isn't the feature set in any case after conversion is done, whether there was any slot to convert or not?
>>>
>>> I would expect that, if the hierarchical structure should not be used anymore. I wouldn't only set the feature, if there has been any error during conversion. But it's not an error, if convert_imap_account_bayes_to_flat() returns false.
>> The feature isn't set because that would prevent using the file with an older version of GnuCash for no good reason: There aren't any import-map-bayes tags in the new format so there's nothing for the older version to mess up.
>>
>> Regards,
>> John Ralls
>>
> Sounds meaningful, but it's not obvious. The conversion process is not much transparent to the user. The user is not informed about conversion and additionally the conditions, when the GnuCash file is converted and when not, are not quite intuitive, I guess. And the second question for me is, if this is really a relevant use case to keep the GnuCash file unconverted if possible, when it is used with a newer GnuCash version. Maybe it's relevant, if several people, who use different GnuCash versions work on the same GnuCash file. But what happens, when one with a newer GnuCash version uses the bayesian import matcher for the first time, which silently converts the GnuCash file? Will it still be usable with an older GnuCash version?
>
> But actually this is another discussion, which is not important for the bug ticket. What's more important, I created a new fresh GnuCash file with the SKR04 accounts template using my GnuCash version 3.7 and if I checked this correctly, even with this GnuCash version the feature GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the GnuCash file. Can you confirm this? If yes, then the bug report is relevant not only for migration from older to current GnuCash versions but even for current GnuCash versions in general.

No, it's the same discussion. If you don't use the feature then the flag for it won't be set. That's how we designed the feature mechanism. The use case is straightforward: Some user has two computers, a laptop running Windows and a desktop running Ubuntu 16.04. She updates the Windows machine with each release but doesn't know how to build GnuCash on her Ubuntu machine so leaves it at whatever they shipped on 16.04; 2.6.12 IIRC. As long as she doesn't use any of the features supported only in 3.x she can continue to use both systems to keep her books and regardless of which machine she uses to create a book.

The "silently" part could be better. It would indeed be a nice enhancement to pop a dialog box when GnuCash is about to set a feature flag explaining that it's use will preclude using the file on a GnuCash version older than X and giving the user a chance to bail out. It would be nicer still to persist that decision so that it need ask only once, and if the user assents it could set the feature flag even if the feature isn't used, as is the case here. Of course there would then be the need to reset negative responses so that when our hypothetical user upgrades her desktop to Ubuntu 20.04 next spring she can turn on all of those 3.x only features she'd previously declined.

It occurs to me that another improvement to the existing flat Bayesian conversion would be to check the state of the GNC_PREF_USE_BAYES and skip the conversion if it's not true.

Regards,
John Ralls

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Bug 797463 - CSV Import of transactions into a new file hangs

Christian Gruber

Am 10.11.19 um 22:39 schrieb John Ralls:

>
>> On Nov 10, 2019, at 12:07 PM, Christian Gruber <[hidden email]> wrote:
>>
>>
>> Am 08.11.19 um 23:08 schrieb John Ralls:
>>>> On Nov 8, 2019, at 1:58 PM, Christian Gruber <[hidden email]> wrote:
>>>>
>>>> Am 08.11.19 um 04:39 schrieb John Ralls:
>>>>> Christian,
>>>>>
>>>>> It's not that it's not prepared for Bayesian matching, it's that older versions of GnuCash stored the Bayesian match tokens hierarchically. Aaron Laws (lmat) changed it to a flatter structure with somewhat better memory locality for faster access. imap_convert_bayes_to_flat should run once to convert the data and set the feature, after which check_import_map_data will see the flag and return. A file created with 3.x and Baysian maps would already have the feature set.
>>>>>
>>>>>
>>>>> With that background, to your questions:
>>>>>
>>>>> Why does it take so long? Because it traverses the entire tree of accounts, every time. The test book has 1127 accounts. Add to that that there are some things inside the loop that shouldn't be and that convert_imap_account_bayes_to_flat doesn't use some obvious short circuits and you get taking a long time.
>>>>>
>>>>> Why does it run twice? Because there aren't any accounts with import-map-bayes slots, so it does no conversions so it doesn't set the feature.
>>>> Why isn't the feature set in any case after conversion is done, whether there was any slot to convert or not?
>>>>
>>>> I would expect that, if the hierarchical structure should not be used anymore. I wouldn't only set the feature, if there has been any error during conversion. But it's not an error, if convert_imap_account_bayes_to_flat() returns false.
>>> The feature isn't set because that would prevent using the file with an older version of GnuCash for no good reason: There aren't any import-map-bayes tags in the new format so there's nothing for the older version to mess up.
>>>
>>> Regards,
>>> John Ralls
>>>
>> Sounds meaningful, but it's not obvious. The conversion process is not much transparent to the user. The user is not informed about conversion and additionally the conditions, when the GnuCash file is converted and when not, are not quite intuitive, I guess. And the second question for me is, if this is really a relevant use case to keep the GnuCash file unconverted if possible, when it is used with a newer GnuCash version. Maybe it's relevant, if several people, who use different GnuCash versions work on the same GnuCash file. But what happens, when one with a newer GnuCash version uses the bayesian import matcher for the first time, which silently converts the GnuCash file? Will it still be usable with an older GnuCash version?
>>
>> But actually this is another discussion, which is not important for the bug ticket. What's more important, I created a new fresh GnuCash file with the SKR04 accounts template using my GnuCash version 3.7 and if I checked this correctly, even with this GnuCash version the feature GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the GnuCash file. Can you confirm this? If yes, then the bug report is relevant not only for migration from older to current GnuCash versions but even for current GnuCash versions in general.
> No, it's the same discussion. If you don't use the feature then the flag for it won't be set. That's how we designed the feature mechanism. The use case is straightforward: Some user has two computers, a laptop running Windows and a desktop running Ubuntu 16.04. She updates the Windows machine with each release but doesn't know how to build GnuCash on her Ubuntu machine so leaves it at whatever they shipped on 16.04; 2.6.12 IIRC. As long as she doesn't use any of the features supported only in 3.x she can continue to use both systems to keep her books and regardless of which machine she uses to create a book.
>
> The "silently" part could be better. It would indeed be a nice enhancement to pop a dialog box when GnuCash is about to set a feature flag explaining that it's use will preclude using the file on a GnuCash version older than X and giving the user a chance to bail out. It would be nicer still to persist that decision so that it need ask only once, and if the user assents it could set the feature flag even if the feature isn't used, as is the case here. Of course there would then be the need to reset negative responses so that when our hypothetical user upgrades her desktop to Ubuntu 20.04 next spring she can turn on all of those 3.x only features she'd previously declined.
>
> It occurs to me that another improvement to the existing flat Bayesian conversion would be to check the state of the GNC_PREF_USE_BAYES and skip the conversion if it's not true.
This is already implemented. Flag GNC_PREF_USE_BAYES is checked in
functions matchmap_find_destination() and matchmap_store_destination()
and if not set, than functions gnc_account_imap_find_account_bayes() and
gnc_account_imap_add_account_bayes() aren't invoked and no conversion
will be done.
>
> Regards,
> John Ralls
>
Thanks for the revealing explanation of the feature mechanism. In this
case it would be really helpful, if GnuCash would explain itself and
interact with the user, when features are changed and conversions are
done, which are not backwards compatible. Moreover it would be
meaningful, if the user could "enable/disable" backwards compatibility
when creating a new GnuCash file to avoid such extensive conversion
steps. This means, that all supported features of the used GnuCash
version are set by default during creation. Alternatively, if it's
possible to significantly speed up searching for (non-existing) bayesian
match tokens, this should be preferred.

I'll create an enhancement ticket.

Regards,
Christian


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Bug 797463 - CSV Import of transactions into a new file hangs

David Carlson-4
Christian, After you create an enhancement ticket, presumably generalized
for all applications of the undocumented feature upgrade details for
typical cases, please report the bug number here so others that do not get
all new bug reports automatically can find it more easily.

David Carlson

On Sun, Nov 10, 2019 at 4:29 PM Christian Gruber <[hidden email]>
wrote:

>
> Am 10.11.19 um 22:39 schrieb John Ralls:
> >
> >> On Nov 10, 2019, at 12:07 PM, Christian Gruber <
> [hidden email]> wrote:
> >>
> >>
> >> Am 08.11.19 um 23:08 schrieb John Ralls:
> >>>> On Nov 8, 2019, at 1:58 PM, Christian Gruber <
> [hidden email]> wrote:
> >>>>
> >>>> Am 08.11.19 um 04:39 schrieb John Ralls:
> >>>>> Christian,
> >>>>>
> >>>>> It's not that it's not prepared for Bayesian matching, it's that
> older versions of GnuCash stored the Bayesian match tokens hierarchically.
> Aaron Laws (lmat) changed it to a flatter structure with somewhat better
> memory locality for faster access. imap_convert_bayes_to_flat should run
> once to convert the data and set the feature, after which
> check_import_map_data will see the flag and return. A file created with 3.x
> and Baysian maps would already have the feature set.
> >>>>>
> >>>>>
> >>>>> With that background, to your questions:
> >>>>>
> >>>>> Why does it take so long? Because it traverses the entire tree of
> accounts, every time. The test book has 1127 accounts. Add to that that
> there are some things inside the loop that shouldn't be and that
> convert_imap_account_bayes_to_flat doesn't use some obvious short circuits
> and you get taking a long time.
> >>>>>
> >>>>> Why does it run twice? Because there aren't any accounts with
> import-map-bayes slots, so it does no conversions so it doesn't set the
> feature.
> >>>> Why isn't the feature set in any case after conversion is done,
> whether there was any slot to convert or not?
> >>>>
> >>>> I would expect that, if the hierarchical structure should not be used
> anymore. I wouldn't only set the feature, if there has been any error
> during conversion. But it's not an error, if
> convert_imap_account_bayes_to_flat() returns false.
> >>> The feature isn't set because that would prevent using the file with
> an older version of GnuCash for no good reason: There aren't any
> import-map-bayes tags in the new format so there's nothing for the older
> version to mess up.
> >>>
> >>> Regards,
> >>> John Ralls
> >>>
> >> Sounds meaningful, but it's not obvious. The conversion process is not
> much transparent to the user. The user is not informed about conversion and
> additionally the conditions, when the GnuCash file is converted and when
> not, are not quite intuitive, I guess. And the second question for me is,
> if this is really a relevant use case to keep the GnuCash file unconverted
> if possible, when it is used with a newer GnuCash version. Maybe it's
> relevant, if several people, who use different GnuCash versions work on the
> same GnuCash file. But what happens, when one with a newer GnuCash version
> uses the bayesian import matcher for the first time, which silently
> converts the GnuCash file? Will it still be usable with an older GnuCash
> version?
> >>
> >> But actually this is another discussion, which is not important for the
> bug ticket. What's more important, I created a new fresh GnuCash file with
> the SKR04 accounts template using my GnuCash version 3.7 and if I checked
> this correctly, even with this GnuCash version the feature
> GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the GnuCash file. Can you
> confirm this? If yes, then the bug report is relevant not only for
> migration from older to current GnuCash versions but even for current
> GnuCash versions in general.
> > No, it's the same discussion. If you don't use the feature then the flag
> for it won't be set. That's how we designed the feature mechanism. The use
> case is straightforward: Some user has two computers, a laptop running
> Windows and a desktop running Ubuntu 16.04. She updates the Windows machine
> with each release but doesn't know how to build GnuCash on her Ubuntu
> machine so leaves it at whatever they shipped on 16.04; 2.6.12 IIRC. As
> long as she doesn't use any of the features supported only in 3.x she can
> continue to use both systems to keep her books and regardless of which
> machine she uses to create a book.
> >
> > The "silently" part could be better. It would indeed be a nice
> enhancement to pop a dialog box when GnuCash is about to set a feature flag
> explaining that it's use will preclude using the file on a GnuCash version
> older than X and giving the user a chance to bail out. It would be nicer
> still to persist that decision so that it need ask only once, and if the
> user assents it could set the feature flag even if the feature isn't used,
> as is the case here. Of course there would then be the need to reset
> negative responses so that when our hypothetical user upgrades her desktop
> to Ubuntu 20.04 next spring she can turn on all of those 3.x only features
> she'd previously declined.
> >
> > It occurs to me that another improvement to the existing flat Bayesian
> conversion would be to check the state of the GNC_PREF_USE_BAYES and skip
> the conversion if it's not true.
> This is already implemented. Flag GNC_PREF_USE_BAYES is checked in
> functions matchmap_find_destination() and matchmap_store_destination()
> and if not set, than functions gnc_account_imap_find_account_bayes() and
> gnc_account_imap_add_account_bayes() aren't invoked and no conversion
> will be done.
> >
> > Regards,
> > John Ralls
> >
> Thanks for the revealing explanation of the feature mechanism. In this
> case it would be really helpful, if GnuCash would explain itself and
> interact with the user, when features are changed and conversions are
> done, which are not backwards compatible. Moreover it would be
> meaningful, if the user could "enable/disable" backwards compatibility
> when creating a new GnuCash file to avoid such extensive conversion
> steps. This means, that all supported features of the used GnuCash
> version are set by default during creation. Alternatively, if it's
> possible to significantly speed up searching for (non-existing) bayesian
> match tokens, this should be preferred.
>
> I'll create an enhancement ticket.
>
> Regards,
> Christian
>
>
> _______________________________________________
> gnucash-devel mailing list
> [hidden email]
> https://lists.gnucash.org/mailman/listinfo/gnucash-devel
>


--
David Carlson
_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Bug 797463 - CSV Import of transactions into a new file hangs

Geert Janssens-4
In reply to this post by Christian Gruber
Op zondag 10 november 2019 23:28:06 CET schreef Christian Gruber:

> Am 10.11.19 um 22:39 schrieb John Ralls:
> >> On Nov 10, 2019, at 12:07 PM, Christian Gruber
> >> <[hidden email]> wrote:>>
> >> Am 08.11.19 um 23:08 schrieb John Ralls:
> >>>> On Nov 8, 2019, at 1:58 PM, Christian Gruber
> >>>> <[hidden email]> wrote:>>>>
> >>>> Am 08.11.19 um 04:39 schrieb John Ralls:
> >>>>> Christian,
> >>>>>
> >>>>> It's not that it's not prepared for Bayesian matching, it's that older
> >>>>> versions of GnuCash stored the Bayesian match tokens hierarchically.
> >>>>> Aaron Laws (lmat) changed it to a flatter structure with somewhat
> >>>>> better memory locality for faster access. imap_convert_bayes_to_flat
> >>>>> should run once to convert the data and set the feature, after which
> >>>>> check_import_map_data will see the flag and return. A file created
> >>>>> with 3.x and Baysian maps would already have the feature set.
> >>>>>
> >>>>>
> >>>>> With that background, to your questions:
> >>>>>
> >>>>> Why does it take so long? Because it traverses the entire tree of
> >>>>> accounts, every time. The test book has 1127 accounts. Add to that
> >>>>> that there are some things inside the loop that shouldn't be and that
> >>>>> convert_imap_account_bayes_to_flat doesn't use some obvious short
> >>>>> circuits and you get taking a long time.
> >>>>>
> >>>>> Why does it run twice? Because there aren't any accounts with
> >>>>> import-map-bayes slots, so it does no conversions so it doesn't set
> >>>>> the feature.>>>>
> >>>> Why isn't the feature set in any case after conversion is done, whether
> >>>> there was any slot to convert or not?
> >>>>
> >>>> I would expect that, if the hierarchical structure should not be used
> >>>> anymore. I wouldn't only set the feature, if there has been any error
> >>>> during conversion. But it's not an error, if
> >>>> convert_imap_account_bayes_to_flat() returns false.>>>
> >>> The feature isn't set because that would prevent using the file with an
> >>> older version of GnuCash for no good reason: There aren't any
> >>> import-map-bayes tags in the new format so there's nothing for the
> >>> older version to mess up.
> >>>
> >>> Regards,
> >>> John Ralls
> >>
> >> Sounds meaningful, but it's not obvious. The conversion process is not
> >> much transparent to the user. The user is not informed about conversion
> >> and additionally the conditions, when the GnuCash file is converted and
> >> when not, are not quite intuitive, I guess. And the second question for
> >> me is, if this is really a relevant use case to keep the GnuCash file
> >> unconverted if possible, when it is used with a newer GnuCash version.
> >> Maybe it's relevant, if several people, who use different GnuCash
> >> versions work on the same GnuCash file. But what happens, when one with
> >> a newer GnuCash version uses the bayesian import matcher for the first
> >> time, which silently converts the GnuCash file? Will it still be usable
> >> with an older GnuCash version?
> >>
> >> But actually this is another discussion, which is not important for the
> >> bug ticket. What's more important, I created a new fresh GnuCash file
> >> with the SKR04 accounts template using my GnuCash version 3.7 and if I
> >> checked this correctly, even with this GnuCash version the feature
> >> GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the GnuCash file. Can you
> >> confirm this? If yes, then the bug report is relevant not only for
> >> migration from older to current GnuCash versions but even for current
> >> GnuCash versions in general.>
> > No, it's the same discussion. If you don't use the feature then the flag
> > for it won't be set. That's how we designed the feature mechanism. The
> > use case is straightforward: Some user has two computers, a laptop
> > running Windows and a desktop running Ubuntu 16.04. She updates the
> > Windows machine with each release but doesn't know how to build GnuCash
> > on her Ubuntu machine so leaves it at whatever they shipped on 16.04;
> > 2.6.12 IIRC. As long as she doesn't use any of the features supported
> > only in 3.x she can continue to use both systems to keep her books and
> > regardless of which machine she uses to create a book.
> >
> > The "silently" part could be better. It would indeed be a nice enhancement
> > to pop a dialog box when GnuCash is about to set a feature flag
> > explaining that it's use will preclude using the file on a GnuCash
> > version older than X and giving the user a chance to bail out. It would
> > be nicer still to persist that decision so that it need ask only once,
> > and if the user assents it could set the feature flag even if the feature
> > isn't used, as is the case here. Of course there would then be the need
> > to reset negative responses so that when our hypothetical user upgrades
> > her desktop to Ubuntu 20.04 next spring she can turn on all of those 3.x
> > only features she'd previously declined.
> >
> > It occurs to me that another improvement to the existing flat Bayesian
> > conversion would be to check the state of the GNC_PREF_USE_BAYES and skip
> > the conversion if it's not true.
> This is already implemented. Flag GNC_PREF_USE_BAYES is checked in
> functions matchmap_find_destination() and matchmap_store_destination()
> and if not set, than functions gnc_account_imap_find_account_bayes() and
> gnc_account_imap_add_account_bayes() aren't invoked and no conversion
> will be done.
>
> > Regards,
> > John Ralls
>
> Thanks for the revealing explanation of the feature mechanism. In this
> case it would be really helpful, if GnuCash would explain itself and
> interact with the user, when features are changed and conversions are
> done, which are not backwards compatible. Moreover it would be
> meaningful, if the user could "enable/disable" backwards compatibility
> when creating a new GnuCash file to avoid such extensive conversion
> steps. This means, that all supported features of the used GnuCash
> version are set by default during creation. Alternatively, if it's
> possible to significantly speed up searching for (non-existing) bayesian
> match tokens, this should be preferred.
>
> I'll create an enhancement ticket.
>
> Regards,
> Christian

I fully agree there should at least be visual feedback to the user when
running the conversions. There are technical challenges in accomplishing that
in this case though.

Another potential performance improvement would be to ensure the conversion
will run only once per session, even if the feature flag doesn't get set. At
least that would mean the slowdown happens only the first time one tries to
import during a longer session.

Regards,

Geert


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Bug 797463 - CSV Import of transactions into a new file hangs

Christian Gruber
In reply to this post by Christian Gruber

Am 10.11.19 um 23:28 schrieb Christian Gruber:

>
> Am 10.11.19 um 22:39 schrieb John Ralls:
>>
>>> On Nov 10, 2019, at 12:07 PM, Christian Gruber
>>> <[hidden email]> wrote:
>>>
>>>
>>> Am 08.11.19 um 23:08 schrieb John Ralls:
>>>>> On Nov 8, 2019, at 1:58 PM, Christian Gruber
>>>>> <[hidden email]> wrote:
>>>>>
>>>>> Am 08.11.19 um 04:39 schrieb John Ralls:
>>>>>> Christian,
>>>>>>
>>>>>> It's not that it's not prepared for Bayesian matching, it's that
>>>>>> older versions of GnuCash stored the Bayesian match tokens
>>>>>> hierarchically. Aaron Laws (lmat) changed it to a flatter
>>>>>> structure with somewhat better memory locality for faster access.
>>>>>> imap_convert_bayes_to_flat should run once to convert the data
>>>>>> and set the feature, after which check_import_map_data will see
>>>>>> the flag and return. A file created with 3.x and Baysian maps
>>>>>> would already have the feature set.
>>>>>>
>>>>>>
>>>>>> With that background, to your questions:
>>>>>>
>>>>>> Why does it take so long? Because it traverses the entire tree of
>>>>>> accounts, every time. The test book has 1127 accounts. Add to
>>>>>> that that there are some things inside the loop that shouldn't be
>>>>>> and that convert_imap_account_bayes_to_flat doesn't use some
>>>>>> obvious short circuits and you get taking a long time.
>>>>>>
>>>>>> Why does it run twice? Because there aren't any accounts with
>>>>>> import-map-bayes slots, so it does no conversions so it doesn't
>>>>>> set the feature.
>>>>> Why isn't the feature set in any case after conversion is done,
>>>>> whether there was any slot to convert or not?
>>>>>
>>>>> I would expect that, if the hierarchical structure should not be
>>>>> used anymore. I wouldn't only set the feature, if there has been
>>>>> any error during conversion. But it's not an error, if
>>>>> convert_imap_account_bayes_to_flat() returns false.
>>>> The feature isn't set because that would prevent using the file
>>>> with an older version of GnuCash for no good reason: There aren't
>>>> any import-map-bayes tags in the new format so there's nothing for
>>>> the older version to mess up.
>>>>
>>>> Regards,
>>>> John Ralls
>>>>
>>> Sounds meaningful, but it's not obvious. The conversion process is
>>> not much transparent to the user. The user is not informed about
>>> conversion and additionally the conditions, when the GnuCash file is
>>> converted and when not, are not quite intuitive, I guess. And the
>>> second question for me is, if this is really a relevant use case to
>>> keep the GnuCash file unconverted if possible, when it is used with
>>> a newer GnuCash version. Maybe it's relevant, if several people, who
>>> use different GnuCash versions work on the same GnuCash file. But
>>> what happens, when one with a newer GnuCash version uses the
>>> bayesian import matcher for the first time, which silently converts
>>> the GnuCash file? Will it still be usable with an older GnuCash
>>> version?
>>>
>>> But actually this is another discussion, which is not important for
>>> the bug ticket. What's more important, I created a new fresh GnuCash
>>> file with the SKR04 accounts template using my GnuCash version 3.7
>>> and if I checked this correctly, even with this GnuCash version the
>>> feature GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the GnuCash
>>> file. Can you confirm this? If yes, then the bug report is relevant
>>> not only for migration from older to current GnuCash versions but
>>> even for current GnuCash versions in general.
>> No, it's the same discussion. If you don't use the feature then the
>> flag for it won't be set. That's how we designed the feature
>> mechanism. The use case is straightforward: Some user has two
>> computers, a laptop running Windows and a desktop running Ubuntu
>> 16.04. She updates the Windows machine with each release but doesn't
>> know how to build GnuCash on her Ubuntu machine so leaves it at
>> whatever they shipped on 16.04; 2.6.12 IIRC. As long as she doesn't
>> use any of the features supported only in 3.x she can continue to use
>> both systems to keep her books and regardless of which machine she
>> uses to create a book.
>>
>> The "silently" part could be better. It would indeed be a nice
>> enhancement to pop a dialog box when GnuCash is about to set a
>> feature flag explaining that it's use will preclude using the file on
>> a GnuCash version older than X and giving the user a chance to bail
>> out. It would be nicer still to persist that decision so that it need
>> ask only once, and if the user assents it could set the feature flag
>> even if the feature isn't used, as is the case here. Of course there
>> would then be the need to reset negative responses so that when our
>> hypothetical user upgrades her desktop to Ubuntu 20.04 next spring
>> she can turn on all of those 3.x only features she'd previously
>> declined.
>>
>> It occurs to me that another improvement to the existing flat
>> Bayesian conversion would be to check the state of the
>> GNC_PREF_USE_BAYES and skip the conversion if it's not true.
> This is already implemented. Flag GNC_PREF_USE_BAYES is checked in
> functions matchmap_find_destination() and matchmap_store_destination()
> and if not set, than functions gnc_account_imap_find_account_bayes()
> and gnc_account_imap_add_account_bayes() aren't invoked and no
> conversion will be done.
>>
>> Regards,
>> John Ralls
>>
> Thanks for the revealing explanation of the feature mechanism. In this
> case it would be really helpful, if GnuCash would explain itself and
> interact with the user, when features are changed and conversions are
> done, which are not backwards compatible. Moreover it would be
> meaningful, if the user could "enable/disable" backwards compatibility
> when creating a new GnuCash file to avoid such extensive conversion
> steps. This means, that all supported features of the used GnuCash
> version are set by default during creation. Alternatively, if it's
> possible to significantly speed up searching for (non-existing)
> bayesian match tokens, this should be preferred.
>
> I'll create an enhancement ticket.
>
> Regards,
> Christian
>
>
I added enhancement ticket 797499
<https://bugs.gnucash.org/show_bug.cgi?id=797499>.


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Bug 797463 - CSV Import of transactions into a new file hangs

Christian Gruber
In reply to this post by Geert Janssens-4

Am 11.11.19 um 11:22 schrieb Geert Janssens:

> Op zondag 10 november 2019 23:28:06 CET schreef Christian Gruber:
>> Am 10.11.19 um 22:39 schrieb John Ralls:
>>>> On Nov 10, 2019, at 12:07 PM, Christian Gruber
>>>> <[hidden email]> wrote:>>
>>>> Am 08.11.19 um 23:08 schrieb John Ralls:
>>>>>> On Nov 8, 2019, at 1:58 PM, Christian Gruber
>>>>>> <[hidden email]> wrote:>>>>
>>>>>> Am 08.11.19 um 04:39 schrieb John Ralls:
>>>>>>> Christian,
>>>>>>>
>>>>>>> It's not that it's not prepared for Bayesian matching, it's that older
>>>>>>> versions of GnuCash stored the Bayesian match tokens hierarchically.
>>>>>>> Aaron Laws (lmat) changed it to a flatter structure with somewhat
>>>>>>> better memory locality for faster access. imap_convert_bayes_to_flat
>>>>>>> should run once to convert the data and set the feature, after which
>>>>>>> check_import_map_data will see the flag and return. A file created
>>>>>>> with 3.x and Baysian maps would already have the feature set.
>>>>>>>
>>>>>>>
>>>>>>> With that background, to your questions:
>>>>>>>
>>>>>>> Why does it take so long? Because it traverses the entire tree of
>>>>>>> accounts, every time. The test book has 1127 accounts. Add to that
>>>>>>> that there are some things inside the loop that shouldn't be and that
>>>>>>> convert_imap_account_bayes_to_flat doesn't use some obvious short
>>>>>>> circuits and you get taking a long time.
>>>>>>>
>>>>>>> Why does it run twice? Because there aren't any accounts with
>>>>>>> import-map-bayes slots, so it does no conversions so it doesn't set
>>>>>>> the feature.>>>>
>>>>>> Why isn't the feature set in any case after conversion is done, whether
>>>>>> there was any slot to convert or not?
>>>>>>
>>>>>> I would expect that, if the hierarchical structure should not be used
>>>>>> anymore. I wouldn't only set the feature, if there has been any error
>>>>>> during conversion. But it's not an error, if
>>>>>> convert_imap_account_bayes_to_flat() returns false.>>>
>>>>> The feature isn't set because that would prevent using the file with an
>>>>> older version of GnuCash for no good reason: There aren't any
>>>>> import-map-bayes tags in the new format so there's nothing for the
>>>>> older version to mess up.
>>>>>
>>>>> Regards,
>>>>> John Ralls
>>>> Sounds meaningful, but it's not obvious. The conversion process is not
>>>> much transparent to the user. The user is not informed about conversion
>>>> and additionally the conditions, when the GnuCash file is converted and
>>>> when not, are not quite intuitive, I guess. And the second question for
>>>> me is, if this is really a relevant use case to keep the GnuCash file
>>>> unconverted if possible, when it is used with a newer GnuCash version.
>>>> Maybe it's relevant, if several people, who use different GnuCash
>>>> versions work on the same GnuCash file. But what happens, when one with
>>>> a newer GnuCash version uses the bayesian import matcher for the first
>>>> time, which silently converts the GnuCash file? Will it still be usable
>>>> with an older GnuCash version?
>>>>
>>>> But actually this is another discussion, which is not important for the
>>>> bug ticket. What's more important, I created a new fresh GnuCash file
>>>> with the SKR04 accounts template using my GnuCash version 3.7 and if I
>>>> checked this correctly, even with this GnuCash version the feature
>>>> GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the GnuCash file. Can you
>>>> confirm this? If yes, then the bug report is relevant not only for
>>>> migration from older to current GnuCash versions but even for current
>>>> GnuCash versions in general.>
>>> No, it's the same discussion. If you don't use the feature then the flag
>>> for it won't be set. That's how we designed the feature mechanism. The
>>> use case is straightforward: Some user has two computers, a laptop
>>> running Windows and a desktop running Ubuntu 16.04. She updates the
>>> Windows machine with each release but doesn't know how to build GnuCash
>>> on her Ubuntu machine so leaves it at whatever they shipped on 16.04;
>>> 2.6.12 IIRC. As long as she doesn't use any of the features supported
>>> only in 3.x she can continue to use both systems to keep her books and
>>> regardless of which machine she uses to create a book.
>>>
>>> The "silently" part could be better. It would indeed be a nice enhancement
>>> to pop a dialog box when GnuCash is about to set a feature flag
>>> explaining that it's use will preclude using the file on a GnuCash
>>> version older than X and giving the user a chance to bail out. It would
>>> be nicer still to persist that decision so that it need ask only once,
>>> and if the user assents it could set the feature flag even if the feature
>>> isn't used, as is the case here. Of course there would then be the need
>>> to reset negative responses so that when our hypothetical user upgrades
>>> her desktop to Ubuntu 20.04 next spring she can turn on all of those 3.x
>>> only features she'd previously declined.
>>>
>>> It occurs to me that another improvement to the existing flat Bayesian
>>> conversion would be to check the state of the GNC_PREF_USE_BAYES and skip
>>> the conversion if it's not true.
>> This is already implemented. Flag GNC_PREF_USE_BAYES is checked in
>> functions matchmap_find_destination() and matchmap_store_destination()
>> and if not set, than functions gnc_account_imap_find_account_bayes() and
>> gnc_account_imap_add_account_bayes() aren't invoked and no conversion
>> will be done.
>>
>>> Regards,
>>> John Ralls
>> Thanks for the revealing explanation of the feature mechanism. In this
>> case it would be really helpful, if GnuCash would explain itself and
>> interact with the user, when features are changed and conversions are
>> done, which are not backwards compatible. Moreover it would be
>> meaningful, if the user could "enable/disable" backwards compatibility
>> when creating a new GnuCash file to avoid such extensive conversion
>> steps. This means, that all supported features of the used GnuCash
>> version are set by default during creation. Alternatively, if it's
>> possible to significantly speed up searching for (non-existing) bayesian
>> match tokens, this should be preferred.
>>
>> I'll create an enhancement ticket.
>>
>> Regards,
>> Christian
> I fully agree there should at least be visual feedback to the user when
> running the conversions. There are technical challenges in accomplishing that
> in this case though.
>
> Another potential performance improvement would be to ensure the conversion
> will run only once per session, even if the feature flag doesn't get set. At
> least that would mean the slowdown happens only the first time one tries to
> import during a longer session.
>
> Regards,
>
> Geert
>
>
I thought again about the case, that the user confirms conversion, but
the conversion is actually not necessary, since there was nothing to
convert like in this case.

This could also be seen as an implementation problem. If GnuCash could
exactly determine, when a conversion will be necessary, before asking
the user, then the problem would be solved.

But in this case it's actually not only the conversion itself, which
needs extensive processing time. It is even the process of checking, if
a conversion would be necessary or not. So one solution could also be,
that GnuCash somehow stores the result of this check, which is different
from setting the feature flag GNC_FEATURE_GUID_FLAT_BAYESIAN.

Futhermore, if GnuCash should check, if the conversion is really
necessary, before asking the user, the current implementation has to be
changed before adding user interaction, since both the check and the
conversion itself are implemented in the same function
imap_convert_bayes_to_flat() currently.

On the other hand it's not that bad to ask the user even in uncertain
cases, i.e. when it's only assumed, that a conversion will be necessary.
But if the user then confirms conversion, it must be performed, i.e. the
feature flag has to be set anyway.

Regards,
Christian


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Bug 797463 - CSV Import of transactions into a new file hangs

John Ralls-2


> On Nov 11, 2019, at 2:24 PM, Christian Gruber <[hidden email]> wrote:
>
>
> Am 11.11.19 um 11:22 schrieb Geert Janssens:
>> Op zondag 10 november 2019 23:28:06 CET schreef Christian Gruber:
>>> Am 10.11.19 um 22:39 schrieb John Ralls:
>>>>> On Nov 10, 2019, at 12:07 PM, Christian Gruber
>>>>> <[hidden email]> wrote:>>
>>>>> Am 08.11.19 um 23:08 schrieb John Ralls:
>>>>>>> On Nov 8, 2019, at 1:58 PM, Christian Gruber
>>>>>>> <[hidden email]> wrote:>>>>
>>>>>>> Am 08.11.19 um 04:39 schrieb John Ralls:
>>>>>>>> Christian,
>>>>>>>>
>>>>>>>> It's not that it's not prepared for Bayesian matching, it's that older
>>>>>>>> versions of GnuCash stored the Bayesian match tokens hierarchically.
>>>>>>>> Aaron Laws (lmat) changed it to a flatter structure with somewhat
>>>>>>>> better memory locality for faster access. imap_convert_bayes_to_flat
>>>>>>>> should run once to convert the data and set the feature, after which
>>>>>>>> check_import_map_data will see the flag and return. A file created
>>>>>>>> with 3.x and Baysian maps would already have the feature set.
>>>>>>>>
>>>>>>>>
>>>>>>>> With that background, to your questions:
>>>>>>>>
>>>>>>>> Why does it take so long? Because it traverses the entire tree of
>>>>>>>> accounts, every time. The test book has 1127 accounts. Add to that
>>>>>>>> that there are some things inside the loop that shouldn't be and that
>>>>>>>> convert_imap_account_bayes_to_flat doesn't use some obvious short
>>>>>>>> circuits and you get taking a long time.
>>>>>>>>
>>>>>>>> Why does it run twice? Because there aren't any accounts with
>>>>>>>> import-map-bayes slots, so it does no conversions so it doesn't set
>>>>>>>> the feature.>>>>
>>>>>>> Why isn't the feature set in any case after conversion is done, whether
>>>>>>> there was any slot to convert or not?
>>>>>>>
>>>>>>> I would expect that, if the hierarchical structure should not be used
>>>>>>> anymore. I wouldn't only set the feature, if there has been any error
>>>>>>> during conversion. But it's not an error, if
>>>>>>> convert_imap_account_bayes_to_flat() returns false.>>>
>>>>>> The feature isn't set because that would prevent using the file with an
>>>>>> older version of GnuCash for no good reason: There aren't any
>>>>>> import-map-bayes tags in the new format so there's nothing for the
>>>>>> older version to mess up.
>>>>>>
>>>>>> Regards,
>>>>>> John Ralls
>>>>> Sounds meaningful, but it's not obvious. The conversion process is not
>>>>> much transparent to the user. The user is not informed about conversion
>>>>> and additionally the conditions, when the GnuCash file is converted and
>>>>> when not, are not quite intuitive, I guess. And the second question for
>>>>> me is, if this is really a relevant use case to keep the GnuCash file
>>>>> unconverted if possible, when it is used with a newer GnuCash version.
>>>>> Maybe it's relevant, if several people, who use different GnuCash
>>>>> versions work on the same GnuCash file. But what happens, when one with
>>>>> a newer GnuCash version uses the bayesian import matcher for the first
>>>>> time, which silently converts the GnuCash file? Will it still be usable
>>>>> with an older GnuCash version?
>>>>>
>>>>> But actually this is another discussion, which is not important for the
>>>>> bug ticket. What's more important, I created a new fresh GnuCash file
>>>>> with the SKR04 accounts template using my GnuCash version 3.7 and if I
>>>>> checked this correctly, even with this GnuCash version the feature
>>>>> GNC_FEATURE_GUID_FLAT_BAYESIAN is not set in the GnuCash file. Can you
>>>>> confirm this? If yes, then the bug report is relevant not only for
>>>>> migration from older to current GnuCash versions but even for current
>>>>> GnuCash versions in general.>
>>>> No, it's the same discussion. If you don't use the feature then the flag
>>>> for it won't be set. That's how we designed the feature mechanism. The
>>>> use case is straightforward: Some user has two computers, a laptop
>>>> running Windows and a desktop running Ubuntu 16.04. She updates the
>>>> Windows machine with each release but doesn't know how to build GnuCash
>>>> on her Ubuntu machine so leaves it at whatever they shipped on 16.04;
>>>> 2.6.12 IIRC. As long as she doesn't use any of the features supported
>>>> only in 3.x she can continue to use both systems to keep her books and
>>>> regardless of which machine she uses to create a book.
>>>>
>>>> The "silently" part could be better. It would indeed be a nice enhancement
>>>> to pop a dialog box when GnuCash is about to set a feature flag
>>>> explaining that it's use will preclude using the file on a GnuCash
>>>> version older than X and giving the user a chance to bail out. It would
>>>> be nicer still to persist that decision so that it need ask only once,
>>>> and if the user assents it could set the feature flag even if the feature
>>>> isn't used, as is the case here. Of course there would then be the need
>>>> to reset negative responses so that when our hypothetical user upgrades
>>>> her desktop to Ubuntu 20.04 next spring she can turn on all of those 3.x
>>>> only features she'd previously declined.
>>>>
>>>> It occurs to me that another improvement to the existing flat Bayesian
>>>> conversion would be to check the state of the GNC_PREF_USE_BAYES and skip
>>>> the conversion if it's not true.
>>> This is already implemented. Flag GNC_PREF_USE_BAYES is checked in
>>> functions matchmap_find_destination() and matchmap_store_destination()
>>> and if not set, than functions gnc_account_imap_find_account_bayes() and
>>> gnc_account_imap_add_account_bayes() aren't invoked and no conversion
>>> will be done.
>>>
>>>> Regards,
>>>> John Ralls
>>> Thanks for the revealing explanation of the feature mechanism. In this
>>> case it would be really helpful, if GnuCash would explain itself and
>>> interact with the user, when features are changed and conversions are
>>> done, which are not backwards compatible. Moreover it would be
>>> meaningful, if the user could "enable/disable" backwards compatibility
>>> when creating a new GnuCash file to avoid such extensive conversion
>>> steps. This means, that all supported features of the used GnuCash
>>> version are set by default during creation. Alternatively, if it's
>>> possible to significantly speed up searching for (non-existing) bayesian
>>> match tokens, this should be preferred.
>>>
>>> I'll create an enhancement ticket.
>>>
>>> Regards,
>>> Christian
>> I fully agree there should at least be visual feedback to the user when
>> running the conversions. There are technical challenges in accomplishing that
>> in this case though.
>>
>> Another potential performance improvement would be to ensure the conversion
>> will run only once per session, even if the feature flag doesn't get set. At
>> least that would mean the slowdown happens only the first time one tries to
>> import during a longer session.
>>
>> Regards,
>>
>> Geert
>>
>>
> I thought again about the case, that the user confirms conversion, but the conversion is actually not necessary, since there was nothing to convert like in this case.
>
> This could also be seen as an implementation problem. If GnuCash could exactly determine, when a conversion will be necessary, before asking the user, then the problem would be solved.
>
> But in this case it's actually not only the conversion itself, which needs extensive processing time. It is even the process of checking, if a conversion would be necessary or not. So one solution could also be, that GnuCash somehow stores the result of this check, which is different from setting the feature flag GNC_FEATURE_GUID_FLAT_BAYESIAN.
>
> Futhermore, if GnuCash should check, if the conversion is really necessary, before asking the user, the current implementation has to be changed before adding user interaction, since both the check and the conversion itself are implemented in the same function imap_convert_bayes_to_flat() currently.
>
> On the other hand it's not that bad to ask the user even in uncertain cases, i.e. when it's only assumed, that a conversion will be necessary. But if the user then confirms conversion, it must be performed, i.e. the feature flag has to be set anyway.

Christian,

It would be great if convert_imap_account_bayes_to_flat could run once and set some "no import-bayes-maps" flag, but the user could decide to do an import some following session and make that setting not true. We can of course fix that by also having the bayes matcher unset the flag if it's set, but that wouldn't help the user who decides to do that import on GnuCash 2.6.21.

Regards,
John Ralls

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel