[GNC-dev] Normalizing live data, a suggestion for discussion

classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GNC-dev] Perl SheBang

Frank H. Ellenberger-3


Am 04.02.19 um 09:40 schrieb Christian Stimming:
> Thanks for the pointer. I've copied this script into our git at
>   ./util/obfuscate.pl

While for most gnc-fq-* scripts we us
#!@-PERL-@
and adjust them while building.

In utils all perl scripts are hardcoded to
#! /usr/bin/perl
Wouldn't it make sense to have them also configurable?

Regards
Frank

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Perl SheBang

Alain D D Williams
On Mon, Feb 04, 2019 at 10:23:46AM +0100, Frank H. Ellenberger wrote:

>
>
> Am 04.02.19 um 09:40 schrieb Christian Stimming:
> > Thanks for the pointer. I've copied this script into our git at
> >   ./util/obfuscate.pl
>
> While for most gnc-fq-* scripts we us
> #!@-PERL-@
> and adjust them while building.
>
> In utils all perl scripts are hardcoded to
> #! /usr/bin/perl
> Wouldn't it make sense to have them also configurable?

How about going:

#! /usr/bin/env perl

--
Alain Williams
Linux/GNU Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer.
+44 (0) 787 668 0256  https://www.phcomp.co.uk/
Parliament Hill Computers Ltd. Registration Information: https://www.phcomp.co.uk/contact.php
#include <std_disclaimer.h>
_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Normalizing live data, a suggestion for discussion

Geert Janssens-4
In reply to this post by GnuCash - Dev mailing list
Op zaterdag 2 februari 2019 22:36:18 CET schreef Wm via gnucash-devel:

> On 02/02/2019 15:24, Geert Janssens wrote:
> > Yes, if you use business features, you may have entered business
> > identifying data in File->Properties. It think that's what David is
> > referring to.
> I agree, the third party should not be identified.
>
> > Similarly there may be customer and vendor data (names addresses) in the
> > book that should equally be obfuscated. Just random data is fine.
>
> Yes.
>
> Geert, at the moment I am putting guid in place of random, do you think
> that is a wrong way to approach this?
>
I think GUIDs are probably fine as well.

Note I'm going by the theoretical goal of not being able to reconstruct the
user's real financial data from the obfuscated file. Personally I'm not
interested in doing that at all,  but people's paranoia levels may vary.

So talking of guids. If I remember correctly the default guids for accounts
coming from gnucash account templates are hard-coded (or at least they used to
be until somewhere in the 2.6 series.

So if that is still true then guid for account names is only fake obfuscation.
And perhaps these guids should be replaced throughout the book during the
obfuscation before replacing account names with guids

> Actually, the nearer we get to complete random the less useful the file
> becomes.  Actual random data is harder than most people think and pretty
> much defeats the purpose if you think about it.
>
From a human's point of view a guid is just random numbers. So I don't see how
that makes a difference. If the same random value is used where the data was
the same in the original book, it's just like using a guid. And I'm no talking
of numbers for this part, I'm talking about customer names, vendor addresses,
that kind of stuff.

> > Continuing on that vein, if you have bills and invoices, aside from
> > randomizing the transaction's split amounts and values you'll also have to
> > do the same for invoice entries.
>
> I don't think that is true in most situations and even if what you say
> is true, I don't see it as a good argument against *attempting* a
> normalized book for most people.
>
It's true if the bug to investigate is somewhere in the business code. In that
case what your invoice data says should match what the resulting transactions
say. Those are stored in different parts in the book, but are interrelated.

But even if the bug is not in business data, the business data should be
properly anonymized or removed anyway such that the user can confidently share
it without risking real financial or private info can be extracted from it. Of
course in that context the business data no longer has to be consistent though
I still believe it makes debugging harder if it isn't.

> > And to make the book useful for detecting
> > business data bugs this should happen in such a way that invoice tax and
> > discount amounts remain consistent after multiplying with random numbers
> > *and* that the invoice totals continue to match the business transactions
> > amounts in AR/AP accounts.
>
> There will be situations that involve the person doing the triage
> needing to see actual transactions, I have already commented on that.
>
Sure. However that's not what I'm implying here. The extra business
requirements are an extension of your initial concept that transactions should
continue to balance. From a business data point of view invoices with their
entries should continue to balance with their invoice transactions or the data
quickly becomes meaningless.

> > And to make that one level more complicated, after that the payment
> > transactions *also* have to continue to match the new randomized invoice
> > amount (if the invoice was paid in full).
>
> Ummmm, I don't think that is true.  If the munged numbers match (and
> they will, that is what the script will do) the transaction stream will
> be OK.
>
> It is possible I have missed your point, Geert, but I think it is
> looking like I understand the contents of the gnc files better than you :(
>
You did miss the point. You only think of balancing transactions. I'm also
thinking of balancing lots, a more hidden aspect of the business data that's
crucial to debug payment issues. My next reservation was also about consistent
lots.

> > It doesn't end there, payments can be split over multiple invoices, so
> > again when one randomizes invoice amounts care must be taken to adjust
> > the payments in proportion to the invoice amount change or fully paid
> > invoices suddenly can become partially paid or overpaid.
>
> Not true.
>
> Geert, I don't want to say this but I believe you are actually wrong,
> for once.

It would be more useful to explain why you think that.
>
> > While this is probably all possible I believe the resulting script will be
> > so complex that it will become a source of bugs in itself which would
> > divert developer time to debugging and maintaining this script rather
> > than working on the effectively reported bug for which a sample data file
> > was asked in the first place...
>
> Hmmmm, I accept your point and disagree.
>
I agree that may have been overly pessimistic :)

> > Up until a book with only transactions, no business data at all it sounded
> > like a useful tool.
>
> Be a brave man, Geert, most people don't use the business functions :)
>
Right. For those who do that data still needs to be anonymized or you should
explicitly state somewhere it isn't to avoid misunderstandings.

> > Oh and we haven't mentioned SXs and budgets yet...
>
> Unless they are material to the file being investigated I suggest we
> just delete all SXs and budget stuff.
>
Reasonable.

> As far as I am concerned this conversation is ongoing, if only because
> Geert says he still needs a file from me to replicate a basic problem
> that I don't think needs any data from me at all.

It has been a while so you may want to refresh my memory... Which bug
triggered this again ?

Geert


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Normalizing live data, a suggestion for discussion

Geert Janssens-4
In reply to this post by GnuCash - Dev mailing list
Op zaterdag 2 februari 2019 22:36:18 CET schreef Wm via gnucash-devel:

> On 02/02/2019 15:24, Geert Janssens wrote:
> > As for Colin's question: on Windows and MacOS sqlite is supported out of
> > the box. On linux it may require the additional installation of a libdbi
> > driver. Most distros I know have packages for this driver but they may
> > not be installed by default.
>
> It would be an odd distro that excluded SQLite, it is a requisite for a
> lot of other stuff like browsers.  Thinking aloud: maybe a server only
> install might not have it or someone stupid enough to put their data on
> Amazon might not have it available.  The question then becomes, why was
> the person so stupid?

Well I do understand sqlite is available by default, but gnucash requires
libdbi with the sqlite backend (which in turn indeed uses sqlite). I haven't
checked whether all supported distros also have that combination installed by
default. I don't know if webbrowsers also use libdbi. I know firefox does not.

And I haven't and won't spend time to check this for all those distros.

However I do agree this should only be a small hurdle. And I understand your
script is an optional aid for those people that would want a better privacy
guarantee before sending their data in for analysis.

Geert


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Perl SheBang

John Ralls-2
In reply to this post by Alain D D Williams


> On Feb 4, 2019, at 1:27 AM, Alain D D Williams <[hidden email]> wrote:
>
> On Mon, Feb 04, 2019 at 10:23:46AM +0100, Frank H. Ellenberger wrote:
>>
>>
>> Am 04.02.19 um 09:40 schrieb Christian Stimming:
>>> Thanks for the pointer. I've copied this script into our git at
>>>  ./util/obfuscate.pl
>>
>> While for most gnc-fq-* scripts we us
>> #!@-PERL-@
>> and adjust them while building.
>>
>> In utils all perl scripts are hardcoded to
>> #! /usr/bin/perl
>> Wouldn't it make sense to have them also configurable?

No, because gnc-fq-* are build products and util/*.pl are build tools. Since obfuscate.pl isn't a build tool it doesn't belong in util; if we're going to distribute it for users (and if we're not why put it in the repo at all?) then it needs to go somewhere Cmake can find it and install it to $CMAKE_INSTALL_PREFIX/bin or libexec and it needs to be renamed to "gnc-xml-obfuscate".

>
> How about going:
>
> #! /usr/bin/env perl

That's widely regarded as a security hole, though it's also widely used. Since it's trivial to override the shebang by calling the perl of your choice and passing the script as $1 it's kind of pointless.

While we're on the topic of shebangs remember that they don't work on Windows. Remember too that running this obfuscate script on Windows will require the user to install perl. They might already have done so for Finance::Quote, but lots of users don't use F::Q.

Regards,
John Ralls



_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] obfuscation script, windows/Perl SheBang

Christian Stimming-4
Am Montag, 4. Februar 2019, 16:32:38 CET schrieb John Ralls:

> >>> Thanks for the pointer. I've copied this script into our git at
> >>>
> >>>  ./util/obfuscate.pl
> >>
> >> While for most gnc-fq-* scripts we us
> >> #!@-PERL-@
> >> and adjust them while building.
> >>
> >> In utils all perl scripts are hardcoded to
> >> #! /usr/bin/perl
> >> Wouldn't it make sense to have them also configurable?
>
> No, because gnc-fq-* are build products and util/*.pl are build tools. Since
> obfuscate.pl isn't a build tool it doesn't belong in util; if we're going
> to distribute it for users (and if we're not why put it in the repo at
> all?) then it needs to go somewhere Cmake can find it and install it to
> $CMAKE_INSTALL_PREFIX/bin or libexec and it needs to be renamed to
> "gnc-xml-obfuscate".

Sure. Since it's code and I wanted to edit it, I wanted to put it somewhere in
git. If there is a better place for it, please anyone feel free to move it
there.

> > How about going:
> >
> > #! /usr/bin/env perl
>
> That's widely regarded as a security hole, though it's also widely used.
> Since it's trivial to override the shebang by calling the perl of your
> choice and passing the script as $1 it's kind of pointless.
>
> While we're on the topic of shebangs remember that they don't work on
> Windows. Remember too that running this obfuscate script on Windows will
> require the user to install perl. They might already have done so for
> Finance::Quote, but lots of users don't use F::Q.

The script won't work on Windows anyway, at least not out of the box, because
it not only needs a Perl installation including XML::DOM, but also some word
list. On Linux this is available under /usr/share/dict/words (symlink to the
default language's word list), but for windows some other choice has to be
written into the obfuscate.pl script. Currently this isn't the case, so it
will just complain about the missing word list.

Regards,
Christian



_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Normalizing live data, a suggestion for discussion

GnuCash - Dev mailing list
In reply to this post by David Cousens
On 03/02/2019 02:01, David Cousens wrote:

> As Geert pointed out whole of program testing is very difficult and rapidly
> reaches a situation where complexity is equal to or greater than  the
> program complexity and this is really what gave rise to unit testing where
> you test individual components which do a specific function.

That can't fix a problem where an incorrect presumption was made in the
first place.

> One area in which an example file  rather than a test file might be useful
> is in developing  the documentation. The guide section on Accounts
> Transaction following through to Personal Finances
> in escence constructs a simple file while doing the tutorial. Here though it
> is  the process of constructing the data in the file that is useful. A
> completed example file is not of great use.

I'd advise against using any file as the right file for documentation
purposes.  There are just too many edge cases.

Something I think would be amusing rather than instructive would be to
put all of the example tx in the docs into one file.  I doubt it would
be useful to anyone other than an historian of finance programs but it
would be fun to see what we ended up with.  If someone is thinking of
presenting a paper at a conference try it, mention me if you are feeling
generous :)

> It is also likely that most problems which are likely to require this depth
> of investigation are unlikely to show up in a test file unless you can
> execute a series of entries in a scripted manner i.e. interact with the gui
> from a script and this is not possible with GnuCash at the moment AFAIK.
> The problem is usually somewhere in the process of getting to the results in
> the file and what is in the file is merely a symptom of the problem.

gnc is a transaction stream application.  each time you open a file it
starts from 0 and does addition and subtraction.  no more no less.

on top of that we have pretty stuff, convenient ways of adding new
transactions to the stream, convenient ways of reporting the results of
the stream.

nevertheless, it is still just a program interpreting a stream of
transactions.

gnc is a convenience.  I don't see why I should have to give live data
to people I don't know in person ... and I don't even have super secret
stuff like tax havens or a Donald Trump blow job account or a religious
belief.

I just feel uncomfortable showing ordinary tx to people I don't know, it
is that simple to me.

Q: Why does someone need to see *my* (or your) tx to fix a problem?
A: they don't

So, we are stuck.

--
Wm

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Normalizing live data, a suggestion for discussion

GnuCash - Dev mailing list
In reply to this post by John Ralls-2
On 03/02/2019 16:03, John Ralls wrote:

>
>
>> On Feb 2, 2019, at 8:10 PM, David Carlson <[hidden email]> wrote:
>>
>> OK, I want to try https://wiki.gnucash.org/wiki/ObfuscateScript but I am
>> not a computer programmer.  I have no clue how to use it.  Can someone help
>> me?
>
> Run it from a command line using perl, assuming here that you have Strawberry installed on C:
>
>    c:\strawberry\perl\bin\perl.exe ObfuscateScript path/to/myfile.gnucash
>
> Note that it rewrites the file in place, so make a copy and run it on that. The file needs to be uncompressed.

Apart from the write in place I quite like it as an idea to progress
thought.

Positive: it is in perl which (many|most) people may have a working
version of if they are using F::Q

Negative: it doesn't reconcile well, but this may actually be a positive
because ...

Positive: if the script breaks some splits this should be seen as a good
thing by some, it makes the work of the super secret agents running gnc
harder.

Thinking aloud: another way of normalizing would be to split to some
point beyond usefulness and let gnc put it back together again using
Actions / Check & Repair

===

Remember flox, the idea is a file that someone else (who probably didn't
vote for the idiot Trump) could look at to see *your* problem.

Does the remote person want to see you paid USD10 for a burger meal and
some beer then vomited on the pavement and had to pay a fine for that?
Nope.  The remote person wants to see what the fuck you have put in your
file that is screwing up the transaction stream.

--
Wm

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Normalizing/obfuscating live data

GnuCash - Dev mailing list
In reply to this post by Christian Stimming-4
On 04/02/2019 08:40, Christian Stimming wrote:
> In a real data file there are still more places with text that need to be
> modified, e.g. the scheduled transaction templates, bayes import matching, and
> such. Also, the dates are left unmodified which may or may not be a problem.

Stripping out scheduled tx should be OK unless they are specific to the
problem being reported.

Because gnc is, by definition, a tx stream processor future tx are not
normally noticed until encountered.  (Personally I love the ability to
generate tx in the future, it allows me to model my immediate monetary
future.  A very positive thing.)

I think all of the import stuff should be stripped too.

Dates are more interesting, Christian

people (right or wrong) place value on dates (in my culture it will be
14 Feb soon)

How about this as a proposal?

If the dates in the file are in sequence it usually won't matter how
much time is in between each date.

Why do I say this?

Because gnc is a *sequential* tx processor and as such the *sequence* of
transactions can be important but the actual dates often aren't.

If anyone is struggling with this conceptually, in a gnc file the date
defines the order in which a tx is processed, that is what a transaction
stream program does.  The tx may be in the wrong order (this is part of
the reason why gnc does the weird thing of loading everything into
memory, it can't trust the file!) so it has to work out which tx is
first, which one comes next and so on.

I don't think I am teaching ChristianS anything, just explaining stuff.

So, I think the dates can be modified so long as the *order* of dates
and times is left extant.

Proposal: make the first date random (after 1971 or some later date for
technical reasons), treat the tx in date+time sequence adding one day
each time a difference is noted.  This will produce a time compressed
file that obfuscates when someone actually did something.

Thoughts?

--
Wm

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Normalizing live data, a suggestion for discussion

GnuCash - Dev mailing list
In reply to this post by David Carlson-4
On 03/02/2019 04:10, David Carlson wrote:
> OK, I want to try https://wiki.gnucash.org/wiki/ObfuscateScript but I am
> not a computer programmer.  I have no clue how to use it.  Can someone help
> me?

it is perl, if you have F::Q working you probably have enough kit to run it.

--
Wm


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Perl SheBang

GnuCash - Dev mailing list
In reply to this post by John Ralls-2
On 04/02/2019 15:32, John Ralls wrote:

> While we're on the topic of shebangs remember that they don't work on Windows. Remember too that running this obfuscate script on Windows will require the user to install perl. They might already have done so for Finance::Quote, but lots of users don't use F::Q.

True, there are  number of farmers that don't know why the person they
voted for changed the price of their crops.

--
Wm

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] obfuscation script, windows/Perl SheBang

GnuCash - Dev mailing list
In reply to this post by Christian Stimming-4
On 04/02/2019 17:03, Christian Stimming wrote:

> The script won't work on Windows anyway, at least not out of the box, because
> it not only needs a Perl installation including XML::DOM, but also some word
> list. On Linux this is available under /usr/share/dict/words (symlink to the
> default language's word list), but for windows some other choice has to be
> written into the obfuscate.pl script. Currently this isn't the case, so it
> will just complain about the missing word list.

you know perl has a built in complain, I presume

--
Wm


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Normalizing live data, a suggestion for discussion

GnuCash - Dev mailing list
In reply to this post by David Cousens
On 02/02/2019 23:05, David Cousens wrote:

> I don't since I retired a few years ago, but I did for 8 years prior to
> retiring (and I used MYOB for the 10 years prior to that before escaping). I
> am certainly not alone. You could have a proviso that the script won't work
> for files using the business functions but that then detracts considerably
> from its usefulness as a general diagnostic tool.

I'm respecting you more as we progress, DavidC.

The broad point is that a normalization is without opinion or value.

No person would know if you had run your business successfully or not.


The fear is "the government will know I earned 20AUD on a contract and I
didn't report".

struth is your government has much larger issues to deal with, ask them
to to pay attention to that.  That is, if you can manage one government
for more than 3 fucking months at a time!

---

Point: MYOB is respected in Oz, Liz says so, it must be true.  Rest of
the world doesn't give a flying fuck about whether it is a good double
accounting prog or not.

---

> Sqlite itself and its availability on Linux is not really an issue. Most
> distros have it in their software repositories. What may be more of an issue
> is that a lot of people who don't use the database backends because they
> don't want the additional hassles of learning to use and maintain databases
> may be reluctant to install it.

True, I think this is also a red herring, most people are using Windows
and SQLite comes with gnc for free.

Shouldnt you be asking why more people aren't using what they already have?

> I'm retired.

Disagree, your mind is still active :)

> Taking an extra half day to learn something
> new doesn't worry me as long as it happens before my time is up. But if I am
> running a busy lfe and/or a business as I used to, I would be more
> reluctant. Again not a show stopper, only a limitation on general
> applicability.
>
> David Cousens

Have a hug.
--
Wm

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
12