[GNC-dev] Normalizing live data

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[GNC-dev] Normalizing live data

Hendrik Boom-2

> On 2/1/19 5:36 AM, Wm via gnucash-devel wrote:
> >
> > [2] as long as the transaction stream balances the actual numbers
> > don't matter (their will be occasions where the numbers are important
> > but these tend to be number extremes related to commodities rather
> > than anyone using gnc to do a Mr Putin vs Mr Trump sports bet).? In
> > most cases multiplying any matching numbers by the same semi-random
> > should produce a good file for examination so long as it is done
> > consistently [4]

If the numbers in the file are integers times some account or
currency-dependent unit, then just clculationg the greatest common
divisor of all the obfuscated numbers will give a good guess as to the
semirandom multiplier.

-- hendrik
_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Normalizing live data

Geert Janssens-4
Op zaterdag 2 februari 2019 14:31:43 CET schreef Hendrik Boom:

> > On 2/1/19 5:36 AM, Wm via gnucash-devel wrote:
> > > [2] as long as the transaction stream balances the actual numbers
> > > don't matter (their will be occasions where the numbers are important
> > > but these tend to be number extremes related to commodities rather
> > > than anyone using gnc to do a Mr Putin vs Mr Trump sports bet).? In
> > > most cases multiplying any matching numbers by the same semi-random
> > > should produce a good file for examination so long as it is done
> > > consistently [4]
>
> If the numbers in the file are integers times some account or
> currency-dependent unit, then just clculationg the greatest common
> divisor of all the obfuscated numbers will give a good guess as to the
> semirandom multiplier.

Do you think that still is possible if a different random number was used for
each transaction ? (That's how I understood Wm's suggestion)

Each transaction will have it's own random number. So for transaction A all
splits may have been multiplied with 450, for Transaction B all numbers may
have been multiplied by 500.

Regards,

Geert


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Normalizing live data

Hendrik Boom-2
On Sat, Feb 02, 2019 at 04:30:30PM +0100, Geert Janssens wrote:

> Op zaterdag 2 februari 2019 14:31:43 CET schreef Hendrik Boom:
> > > On 2/1/19 5:36 AM, Wm via gnucash-devel wrote:
> > > > [2] as long as the transaction stream balances the actual numbers
> > > > don't matter (their will be occasions where the numbers are important
> > > > but these tend to be number extremes related to commodities rather
> > > > than anyone using gnc to do a Mr Putin vs Mr Trump sports bet).? In
> > > > most cases multiplying any matching numbers by the same semi-random
> > > > should produce a good file for examination so long as it is done
> > > > consistently [4]
> >
> > If the numbers in the file are integers times some account or
> > currency-dependent unit, then just clculationg the greatest common
> > divisor of all the obfuscated numbers will give a good guess as to the
> > semirandom multiplier.
>
> Do you think that still is possible if a different random number was used for
> each transaction ? (That's how I understood Wm's suggestion)
>
> Each transaction will have it's own random number. So for transaction A all
> splits may have been multiplied with 450, for Transaction B all numbers may
> have been multiplied by 500.

That might work.  That way eash transaction balances, but the account
balances will be nonsense.

Still, by finding the gcd you can still produce a lower bound on the
transaction values.  And if you, say, split off sales tax into a separate
split your lower bound will oftern be the actual value.

And it's likely that one could also identify income and expense accounts as
such by the pattern of debits vs credits.

-- hendrik

>
> Regards,
>
> Geert
>
>
_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Normalizing live data

John Ralls-2


> On Feb 2, 2019, at 9:44 AM, Hendrik Boom <[hidden email]> wrote:
>
> On Sat, Feb 02, 2019 at 04:30:30PM +0100, Geert Janssens wrote:
>> Op zaterdag 2 februari 2019 14:31:43 CET schreef Hendrik Boom:
>>>> On 2/1/19 5:36 AM, Wm via gnucash-devel wrote:
>>>>> [2] as long as the transaction stream balances the actual numbers
>>>>> don't matter (their will be occasions where the numbers are important
>>>>> but these tend to be number extremes related to commodities rather
>>>>> than anyone using gnc to do a Mr Putin vs Mr Trump sports bet).? In
>>>>> most cases multiplying any matching numbers by the same semi-random
>>>>> should produce a good file for examination so long as it is done
>>>>> consistently [4]
>>>
>>> If the numbers in the file are integers times some account or
>>> currency-dependent unit, then just clculationg the greatest common
>>> divisor of all the obfuscated numbers will give a good guess as to the
>>> semirandom multiplier.
>>
>> Do you think that still is possible if a different random number was used for
>> each transaction ? (That's how I understood Wm's suggestion)
>>
>> Each transaction will have it's own random number. So for transaction A all
>> splits may have been multiplied with 450, for Transaction B all numbers may
>> have been multiplied by 500.
>
> That might work.  That way eash transaction balances, but the account
> balances will be nonsense.
>
> Still, by finding the gcd you can still produce a lower bound on the
> transaction values.  And if you, say, split off sales tax into a separate
> split your lower bound will oftern be the actual value.
>
> And it's likely that one could also identify income and expense accounts as
> such by the pattern of debits vs credits.
>

So maybe we should just forget it and continue the practice of asking users to send their account files directly to a developer with the promise of confidentiality if they're unable to reproduce the bug in a test file. No one has demurred yet.

Regards,
John Ralls

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Normalizing live data

GnuCash - Dev mailing list
In reply to this post by Hendrik Boom-2
On 02/02/2019 13:31, Hendrik Boom wrote:

>
>> On 2/1/19 5:36 AM, Wm via gnucash-devel wrote:
>>>
>>> [2] as long as the transaction stream balances the actual numbers
>>> don't matter (their will be occasions where the numbers are important
>>> but these tend to be number extremes related to commodities rather
>>> than anyone using gnc to do a Mr Putin vs Mr Trump sports bet).? In
>>> most cases multiplying any matching numbers by the same semi-random
>>> should produce a good file for examination so long as it is done
>>> consistently [4]
>
> If the numbers in the file are integers times some account or
> currency-dependent unit, then just clculationg the greatest common
> divisor of all the obfuscated numbers will give a good guess as to the
> semirandom multiplier.

My test script includes randomness introduced by the user.  Could the
numbers be worked backwards? Possibly, maybe probably from a purely
numeric POV.  Will the remote person, having done the work, know what
each of the numbers mean?  Nope.  That is the point I am suggesting we
go for, a numerically sensible file that makes no sense to anyone else
financially.

Will there be times this won't work? Of course, in which case we revert
to the existing system of having to trust someone with your live data.

All I am offering to do is reduce the number of times someone has to
trust someone.  If the community decides this isn't worth the effort so
be it, but I think we should at least think it through.

So, Hendrik, I acknowledge your point but don't think it is significant.

--
Wm

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Normalizing live data

GnuCash - Dev mailing list
In reply to this post by Hendrik Boom-2
On 02/02/2019 17:44, Hendrik Boom wrote:

> On Sat, Feb 02, 2019 at 04:30:30PM +0100, Geert Janssens wrote:
>> Op zaterdag 2 februari 2019 14:31:43 CET schreef Hendrik Boom:
>>>> On 2/1/19 5:36 AM, Wm via gnucash-devel wrote:
>>>>> [2] as long as the transaction stream balances the actual numbers
>>>>> don't matter (their will be occasions where the numbers are important
>>>>> but these tend to be number extremes related to commodities rather
>>>>> than anyone using gnc to do a Mr Putin vs Mr Trump sports bet).? In
>>>>> most cases multiplying any matching numbers by the same semi-random
>>>>> should produce a good file for examination so long as it is done
>>>>> consistently [4]
>>>
>>> If the numbers in the file are integers times some account or
>>> currency-dependent unit, then just clculationg the greatest common
>>> divisor of all the obfuscated numbers will give a good guess as to the
>>> semirandom multiplier.
>>
>> Do you think that still is possible if a different random number was used for
>> each transaction ? (That's how I understood Wm's suggestion)
>>
>> Each transaction will have it's own random number. So for transaction A all
>> splits may have been multiplied with 450, for Transaction B all numbers may
>> have been multiplied by 500.
>
> That might work.  That way eash transaction balances, but the account
> balances will be nonsense.
>
> Still, by finding the gcd you can still produce a lower bound on the
> transaction values.  And if you, say, split off sales tax into a separate
> split your lower bound will oftern be the actual value.
>
> And it's likely that one could also identify income and expense accounts as
> such by the pattern of debits vs credits.

You're presuming a level of snooping that I don't think exists amongst
Geert, John, et al.

--
Wm

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Normalizing live data

GnuCash - Dev mailing list
In reply to this post by John Ralls-2
On 02/02/2019 18:00, John Ralls wrote:

> So maybe we should just forget it and continue the practice of asking users to send their account files directly to a developer with the promise of confidentiality if they're unable to reproduce the bug in a test file.

That's what I'm thinking.

> No one has demurred yet.

Not true.  Some bugs just hang around longer than others.




_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: [GNC-dev] Normalizing live data

John Ralls-2


> On Feb 9, 2019, at 2:28 AM, Wm via gnucash-devel <[hidden email]> wrote:
>
> On 02/02/2019 18:00, John Ralls wrote:
>
>> So maybe we should just forget it and continue the practice of asking users to send their account files directly to a developer with the promise of confidentiality if they're unable to reproduce the bug in a test file.
>
> That's what I'm thinking.
>
>> No one has demurred yet.
>
> Not true.  Some bugs just hang around longer than others.

I meant that no one has declined to send a file directly to me when asked. That hasn't anything to do with bugs hanging around: There are more bugs than there are developer hours available to work on them so they get triaged.

Regards,
John Ralls

_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel