Fixing confused bayesian matching data?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Fixing confused bayesian matching data?

Philip Matthews
Just wondering if anyone has any advice on what to do with some very confused bayesian matching data?

Right now, when I import new transactions (either CSV or QFX), they mostly don't find a match anymore. Only around 20 - 30% match.   This is probably because I like to rejig my accounts from time to time as I continue to figure out what works best for me.  Looking through the ".gnucash" file, I see lots of slot entries with account names that don't exist any more.

For a while, I was just putting up with it and assigning transactions to accounts by hand, but now I am starting to get tired of this.

A couple of options have occurred to me:

1. Just delete everything between <act:slots> and </act:slots> for each account.  This is simple, if rather drastic. But do I do this again in a month when I make another small change to my account structure?

2. Write a Python program that goes through the .gnucash file and deletes slot entries that point at accounts that don't exist any more.

Comments?  Other thoughts?

Running GnuCash 2.6.11 on a Mac.

Philip
_______________________________________________
gnucash-user mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-user
-----
Please remember to CC this list on all your replies.
You can do this by using Reply-To-List or Reply-All.
Reply | Threaded
Open this post in threaded view
|

Re: Fixing confused bayesian matching data?

John Ralls-2

> On Jul 17, 2016, at 6:24 PM, Philip Matthews <[hidden email]> wrote:
>
> Just wondering if anyone has any advice on what to do with some very confused bayesian matching data?
>
> Right now, when I import new transactions (either CSV or QFX), they mostly don't find a match anymore. Only around 20 - 30% match.   This is probably because I like to rejig my accounts from time to time as I continue to figure out what works best for me.  Looking through the ".gnucash" file, I see lots of slot entries with account names that don't exist any more.
>
> For a while, I was just putting up with it and assigning transactions to accounts by hand, but now I am starting to get tired of this.
>
> A couple of options have occurred to me:
>
> 1. Just delete everything between <act:slots> and </act:slots> for each account.  This is simple, if rather drastic. But do I do this again in a month when I make another small change to my account structure?
>
> 2. Write a Python program that goes through the .gnucash file and deletes slot entries that point at accounts that don't exist any more.
>
> Comments?  Other thoughts?

The next major version of GnuCash (due around the end of next year) has a new dialog for deleting old match data contributed by Robert Fewell. It also changes the Bayesian matcher to use account GUIDs instead of names (as the plain-string matcher already does) to make it a bit more resistant to reorganization.

That doesn't do anything for you now, of course. New training will eventually override old training but if there's a lot of data already matched it could take a long time.

You can carefully edit out the match data from your file if you insist. Make a backup or two first and test carefully after your edit!

Regards,
John Ralls


_______________________________________________
gnucash-user mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-user
-----
Please remember to CC this list on all your replies.
You can do this by using Reply-To-List or Reply-All.
Reply | Threaded
Open this post in threaded view
|

Re: Fixing confused bayesian matching data?

Wm...
In reply to this post by Philip Matthews
On Sun, 17 Jul 2016 21:24:56 -0400, in gmane.comp.gnome.apps.gnucash.user,
Philip Matthews <[hidden email]> wrote:

> Just wondering if anyone has any advice on what to do with some very confused bayesian matching data?
>
> Right now, when I import new transactions (either CSV or QFX), they mostly don't find a match anymore. Only around 20 - 30% match.   This is probably because I like to rejig my accounts from time to time as I continue to figure out what works best for me.  Looking through the ".gnucash" file, I see lots of slot entries with account names that don't exist any more.
>
> For a while, I was just putting up with it and assigning transactions to accounts by hand, but now I am starting to get tired of this.
>
> A couple of options have occurred to me:
>
> 1. Just delete everything between <act:slots> and </act:slots> for each account.  This is simple, if rather drastic. But do I do this again in a month when I make another small change to my account structure?
>
> 2. Write a Python program that goes through the .gnucash file and deletes slot entries that point at accounts that don't exist any more.
>
> Comments?  Other thoughts?
>
> Running GnuCash 2.6.11 on a Mac.

Notwithstanding JohnR's reply are these one off
getting-your-accounts-started transactions or something you're going to be
doing on a regular basis?  I'd have thought different answers required
depending.

--
Wm

_______________________________________________
gnucash-user mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-user
-----
Please remember to CC this list on all your replies.
You can do this by using Reply-To-List or Reply-All.
Reply | Threaded
Open this post in threaded view
|

Re: Fixing confused bayesian matching data?

ChrisGood
In reply to this post by Philip Matthews
> Message: 2

> Date: Sun, 17 Jul 2016 20:42:46 -0700
> From: John Ralls <[hidden email]>
> To: Philip Matthews <[hidden email]>
> Cc: Gnucash User <[hidden email]>
> Subject: Re: Fixing confused bayesian matching data?
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset=us-ascii
>
>
> > On Jul 17, 2016, at 6:24 PM, Philip Matthews
> <[hidden email]> wrote:
> >
> > Just wondering if anyone has any advice on what to do with some very
> confused bayesian matching data?
> >
> > Right now, when I import new transactions (either CSV or QFX), they
> mostly don't find a match anymore. Only around 20 - 30% match.   This is
> probably because I like to rejig my accounts from time to time as I
continue
> to figure out what works best for me.  Looking through the ".gnucash"
file, I
> see lots of slot entries with account names that don't exist any more.
> >
> > For a while, I was just putting up with it and assigning transactions to
> accounts by hand, but now I am starting to get tired of this.
> >
> > A couple of options have occurred to me:
> >
> > 1. Just delete everything between <act:slots> and </act:slots> for each
> account.  This is simple, if rather drastic. But do I do this again in a
month
> when I make another small change to my account structure?
> >
> > 2. Write a Python program that goes through the .gnucash file and
deletes
> slot entries that point at accounts that don't exist any more.
> >
> > Comments?  Other thoughts?
>
> The next major version of GnuCash (due around the end of next year) has a
> new dialog for deleting old match data contributed by Robert Fewell. It
also
> changes the Bayesian matcher to use account GUIDs instead of names (as
> the plain-string matcher already does) to make it a bit more resistant to
> reorganization.
>
> That doesn't do anything for you now, of course. New training will
eventually
> override old training but if there's a lot of data already matched it
could take
> a long time.
>
> You can carefully edit out the match data from your file if you insist.
Make a
> backup or two first and test carefully after your edit!
>
> Regards,
> John Ralls
>

Hi Philip,

There is already a perl script to do what you want, although I haven't used
it.
See 'Bayes' in http://wiki.gnucash.org/wiki/Published_tools.

Regards,
Chris Good


_______________________________________________
gnucash-user mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-user
-----
Please remember to CC this list on all your replies.
You can do this by using Reply-To-List or Reply-All.

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fixing confused bayesian matching data?

Philip Matthews
Thanks to John, Wm, and Chris for their replies.

John wrote:
> The next major version of GnuCash (due around the end of next year) has a new dialog for deleting old match data contributed by Robert Fewell. It also changes the Bayesian matcher to use account GUIDs instead of names (as the plain-string matcher already does) to make it a bit more resistant to reorganization.

End of 2017 is a bit long for me to wait, so I think I will look for another solution.  But good to know that a more complete solution is in the pipeline.

> You can carefully edit out the match data from your file if you insist. Make a backup or two first and test carefully after your edit!

I have already done a couple of hand-editing experiments, but there is just too many slots (pages and pages and pages) to make that effective.


Wm wrote:
> Notwithstanding JohnR's reply are these one off getting-your-accounts-started transactions or something you're going to be
> doing on a regular basis?  I'd have thought different answers required depending.

It has been going on for a while now, and I still have more changes I would like to do. So yes, I think I need a good solution.


Chris wrote:
> There is already a perl script to do what you want, although I haven't used it. See 'Bayes' in http://wiki.gnucash.org/wiki/Published_tools.

Thanks for the pointer!  A quick read seems to indicate that this is very close to what I was thinking of.


I also have a memory of someone replacing the Bayesian matching with something more configurable.  Don't recall the details now, or where I saw it.  If anyone knows, a pointer would be helpful.

- Philip
_______________________________________________
gnucash-user mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-user
-----
Please remember to CC this list on all your replies.
You can do this by using Reply-To-List or Reply-All.
Reply | Threaded
Open this post in threaded view
|

Re: Fixing confused bayesian matching data?

David Cousens
Philip,

Rather than hand editing the matching files using an editor with pattern matching capability (e.g Vi, Vim on linux) may help with speeding up editing.

David
David Cousens
Reply | Threaded
Open this post in threaded view
|

re: Fixing confused bayesian matching data?

GnuCash - User mailing list
In reply to this post by Philip Matthews
Philip Matthews<[hidden email]>  wrote:

> Chris wrote:
> > There is already a perl script to do what you want, although I
> haven't used it. See 'Bayes' in
> http://wiki.gnucash.org/wiki/Published_tools.
>
> Thanks for the pointer!  A quick read seems to indicate that this is
> very close to what I was thinking of.
>

There is a somewhat similar, but more comprehensive perl script attached
to this post:

http://gnucash.1415818.n4.nabble.com/Questions-for-a-fresh-GNUCASH-ledger-in-2016-tp4682289p4682347.html

I have used both scripts. I corrected a couple of issues with the second
one, but otherwise it works very well.
_______________________________________________
gnucash-user mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-user
-----
Please remember to CC this list on all your replies.
You can do this by using Reply-To-List or Reply-All.
Reply | Threaded
Open this post in threaded view
|

Re: Fixing confused bayesian matching data?

Wm...
In reply to this post by Philip Matthews
On Mon, 18 Jul 2016 19:43:49 -0400, in gmane.comp.gnome.apps.gnucash.user,
Philip Matthews <[hidden email]> wrote:


>> You can carefully edit out the match data from your file if you insist. Make a backup or two first and test carefully after your edit!
>
> I have already done a couple of hand-editing experiments, but there is just too many slots (pages and pages and pages) to make that effective.

You probably don't want to keep the results of all your experiments

> Wm wrote:
>> Notwithstanding JohnR's reply are these one off getting-your-accounts-started transactions or something you're going to be
>> doing on a regular basis?  I'd have thought different answers required depending.
>
> It has been going on for a while now, and I still have more changes I would like to do. So yes, I think I need a good solution.

You don't want the hangover of umpteen import experiments in your data file
when you eventually decide to get going.

My advice to data capable people is to massage your input to match the
destination, keep doing that until it works.  You're going to be using (or
should be planning to use) the eventual file for longer than the xfer so
your concern should be destination cleanliness not xfer efficiency.

If your data is messy consider running it through one of what I consider to
be a co-joined family of neutral text formats that are accounting aware
from the ledger-cli family.  There are tools in a variety of languages for
getting in and out of gnc.

The point being that csv, etc formats commonly used for exchanging
financial information don't necessarily encompass complete transactions; at
import before getting going that is generally what you have unless you've
been using a single legged excuse for an accounting system.

--
Wm

_______________________________________________
gnucash-user mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-user
-----
Please remember to CC this list on all your replies.
You can do this by using Reply-To-List or Reply-All.
Reply | Threaded
Open this post in threaded view
|

Re: Fixing confused bayesian matching data?

Philip Matthews
In reply to this post by GnuCash - User mailing list

On 2016-07-19, at 10:47 , Cheryl Wheeler wrote:

> Philip Matthews <[hidden email]> wrote:
>> Chris wrote:
>> > There is already a perl script to do what you want, although I haven't used it. See 'Bayes' in http://wiki.gnucash.org/wiki/Published_tools.
>>
>> Thanks for the pointer!  A quick read seems to indicate that this is very close to what I was thinking of.
>>
>
> There is a somewhat similar, but more comprehensive perl script attached to this post:
>
> http://gnucash.1415818.n4.nabble.com/Questions-for-a-fresh-GNUCASH-ledger-in-2016-tp4682289p4682347.html
>
> I have used both scripts. I corrected a couple of issues with the second one, but otherwise it works very well.

Thanks for this pointer!
Do you have a list of the issues you found?

- Philip
_______________________________________________
gnucash-user mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-user
-----
Please remember to CC this list on all your replies.
You can do this by using Reply-To-List or Reply-All.
Reply | Threaded
Open this post in threaded view
|

Re: Fixing confused bayesian matching data?

Jim DeLaHunt-3
In reply to this post by Philip Matthews
Philip:

Sorry for the delay in responding to your message. I waited until I had
something useful posted where you could see it.

I was in exactly your situation back in February.

On Sun, 17 Jul 2016 21:24:56 -0400, Philip Matthews
<[hidden email]> wrote:
> Just wondering if anyone has any advice on what to do with some very confused bayesian matching data?
>
> Right now, when I import new transactions (either CSV or QFX), they mostly don't find a match anymore. Only around 20 - 30% match.   This is probably because I like to rejig my accounts from time to time as I continue to figure out what works best for me.  Looking through the ".gnucash" file, I see lots of slot entries with account names that don't exist any more....
>
> ...Write a Python program that goes through the .gnucash file and deletes slot entries that point at accounts that don't exist any more.
>
> Comments?  Other thoughts?
>
> Running GnuCash 2.6.11 on a Mac.

My solution? XSLT processing <https://en.wikipedia.org/wiki/XSLT>.
GnuCash files can be saved as XML format data, and XSLT is a tool for
modifying XML data in a controlled, reliable way. I wrote a set of XSLT
filters which:

 1. lists the Bayes mapping data for each account in a gnucash XML file;
 2. resets the import mapping, by deleting all the Bayes mapping data
    for every account in a gnucash XML file; and
 3. prunes the import mapping data for certain target accounts in a
    gnucash XML file.

A rather brief explanation of the situation and the GnuCash file format,
with listings of all the XSLT filters, is in my freshly-written blog
post, /Resetting GnuCash’s import transaction matching/
<http://blog.jdlh.com/en/2016/07/29/resetting-gnucashs-import-transaction-matching/>.
Take a look. I hope it's helpful.

--
     --Jim DeLaHunt, [hidden email]     http://blog.jdlh.com/ (http://jdlh.com/)
       multilingual websites consultant

       157-2906 West Broadway, Vancouver BC V6K 2G8, Canada
          Canada mobile +1-604-376-8953

_______________________________________________
gnucash-user mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-user
-----
Please remember to CC this list on all your replies.
You can do this by using Reply-To-List or Reply-All.
Reply | Threaded
Open this post in threaded view
|

Re: Fixing confused bayesian matching data?

Lincoln A Baxter
On Fri, 2016-07-29 at 02:55 -0700, Jim DeLaHunt wrote:

> Philip:
>
> Sorry for the delay in responding to your message. I waited until I
> had 
> something useful posted where you could see it.
>
> I was in exactly your situation back in February.
>
> On Sun, 17 Jul 2016 21:24:56 -0400, Philip Matthews 
> <[hidden email]> wrote:
> > Just wondering if anyone has any advice on what to do with some
> very confused bayesian matching data?
> >
> > Right now, when I import new transactions (either CSV or QFX), they
> mostly don't find a match anymore. Only around 20 - 30% match.   This
> is probably because I like to rejig my accounts from time to time as
> I continue to figure out what works best for me.  Looking through the
> ".gnucash" file, I see lots of slot entries with account names that
> don't exist any more....
> >
> > ...Write a Python program that goes through the .gnucash file and
> deletes slot entries that point at accounts that don't exist any
> more.
> >
> > Comments?  Other thoughts?
> >
> > Running GnuCash 2.6.11 on a Mac.
>
> My solution? XSLT processing <https://en.wikipedia.org/wiki/XSLT>. 
> GnuCash files can be saved as XML format data, and XSLT is a tool
> for 
> modifying XML data in a controlled, reliable way. I wrote a set of
> XSLT 
> filters which:
>
>  1. lists the Bayes mapping data for each account in a gnucash XML
> file;
>  2. resets the import mapping, by deleting all the Bayes mapping data
>     for every account in a gnucash XML file; and
>  3. prunes the import mapping data for certain target accounts in a
>     gnucash XML file.
>
> A rather brief explanation of the situation and the GnuCash file
> format, 
> with listings of all the XSLT filters, is in my freshly-written blog 
> post, /Resetting GnuCash’s import transaction matching/ 
> <http://blog.jdlh.com/en/2016/07/29/resetting-gnucashs-import-transac
> tion-matching/>. 
> Take a look. I hope it's helpful.
This prompts me to reply with a newer version of the perl script I
posted sometime back to do this. (and a whole lot more).  I have also
attached the text of the full man page imbedded in this script.  

Another GC user (Cheryl Wheeler) provided a patch that is incorporated
into this version.  I have been discussing with Chris Good a more
permanent location for this script, which can then be referenced in the
GnuCash wiki. But I have not had time to do much work on that however,
so I repost the script here for now.

Lincoln




_______________________________________________
gnucash-user mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-user
-----
Please remember to CC this list on all your replies.
You can do this by using Reply-To-List or Reply-All.

gc_prune_bayes_data.txt (21K) Download Attachment
gc_prune_bayes_data.pl (52K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fixing confused bayesian matching data?

John Ralls-2

> On Jul 29, 2016, at 5:48 PM, Lincoln A Baxter <[hidden email]> wrote:
>
> On Fri, 2016-07-29 at 02:55 -0700, Jim DeLaHunt wrote:
>> Philip:
>>
>> Sorry for the delay in responding to your message. I waited until I
>> had
>> something useful posted where you could see it.
>>
>> I was in exactly your situation back in February.
>>
>> On Sun, 17 Jul 2016 21:24:56 -0400, Philip Matthews
>> <[hidden email]> wrote:
>>> Just wondering if anyone has any advice on what to do with some
>> very confused bayesian matching data?
>>>
>>> Right now, when I import new transactions (either CSV or QFX), they
>> mostly don't find a match anymore. Only around 20 - 30% match.   This
>> is probably because I like to rejig my accounts from time to time as
>> I continue to figure out what works best for me.  Looking through the
>> ".gnucash" file, I see lots of slot entries with account names that
>> don't exist any more....
>>>
>>> ...Write a Python program that goes through the .gnucash file and
>> deletes slot entries that point at accounts that don't exist any
>> more.
>>>
>>> Comments?  Other thoughts?
>>>
>>> Running GnuCash 2.6.11 on a Mac.
>>
>> My solution? XSLT processing <https://en.wikipedia.org/wiki/XSLT>.
>> GnuCash files can be saved as XML format data, and XSLT is a tool
>> for
>> modifying XML data in a controlled, reliable way. I wrote a set of
>> XSLT
>> filters which:
>>
>>  1. lists the Bayes mapping data for each account in a gnucash XML
>> file;
>>  2. resets the import mapping, by deleting all the Bayes mapping data
>>     for every account in a gnucash XML file; and
>>  3. prunes the import mapping data for certain target accounts in a
>>     gnucash XML file.
>>
>> A rather brief explanation of the situation and the GnuCash file
>> format,
>> with listings of all the XSLT filters, is in my freshly-written blog
>> post, /Resetting GnuCash’s import transaction matching/
>> <http://blog.jdlh.com/en/2016/07/29/resetting-gnucashs-import-transac
>> tion-matching/>.
>> Take a look. I hope it's helpful.
>
> This prompts me to reply with a newer version of the perl script I
> posted sometime back to do this. (and a whole lot more).  I have also
> attached the text of the full man page imbedded in this script.  
>
> Another GC user (Cheryl Wheeler) provided a patch that is incorporated
> into this version.  I have been discussing with Chris Good a more
> permanent location for this script, which can then be referenced in the
> GnuCash wiki. But I have not had time to do much work on that however,
> so I repost the script here for now.

Github would be a good place...

Regards,
John Ralls
_______________________________________________
gnucash-user mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-user
-----
Please remember to CC this list on all your replies.
You can do this by using Reply-To-List or Reply-All.