htdocs and gettext

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

htdocs and gettext

Neil Williams-2
OK. I've got gettext working with PHP to translate a single website using as
many PO files as we can create.

It means yet more restructuring of the htdocs and running some scripts to
create and update the PO files, much as we do with 'make pot' in trunk.

I think svn should simply be a copy of the final website, including binary
translation .mo files in real locations. I'll create some bash scripts to
site in htdocs/ to help create, update and merge the translations.

This would save writing svn hooks to run msgfmt.

I'll have to restructure the top level directories as well because we don't
need these locale directories (the LC_MESSAGES ones) exposed via apache so it
will mean adjusting some Apache settings to map the root directory of the
website to a directory *beneath* htdocs. e.g. where the virtual host maps a
DocumentRoot of htdocs, it will need to be modified to map htdocs/www

We'd then have:
htdocs/
htdocs/en/
htdocs/en/LC_MESSAGES/
htdocs/en/LC_MESSAGES/gnucash-htdocs.mo
htdocs/de/
htdocs/de/LC_MESSAGES/
htdocs/de/LC_MESSAGES/gnucash-htdocs.mo
....
htdocs/www/
htdocs/www/index.phtml
....
htdocs/www/images/
htdocs/news/
.... (the news content is read in by the scripts, news/ does need to be
public)
htdocs/externals/
(externals are also read in directly by the scripts, so a URL is not needed,
only a local filesystem path.)

As much as possible will be moved above the www/ directory.

To protect the .svn directories below www/, this snippet should be added to
the virtual host config:
        <Directory /opt/svn/htdocs/www/*/.svn>
                Order Deny,Allow
                deny from all
        </Directory>

Change /opt/svn/htdocs/ to match the DocumentRoot of the virtual host.

I'm hoping that most of this will be automatic (according to the browser
setup) but there will also be direct links to languages, via the common
header file, that would deliberately set the language of the page.

As such, I would recommend that anyone wanting to translate the site waits
until a POT file is available for the entire site, then translate and upload
the PO file using svn.

.... pending ....

--

Neil Williams
=============
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: htdocs and gettext

Josh Sled
On Mon, 2006-01-23 at 16:33 +0000, Neil Williams wrote:
> OK. I've got gettext working with PHP to translate a single website using as
> many PO files as we can create.

Awesome.

> It means yet more restructuring of the htdocs and running some scripts to
> create and update the PO files, much as we do with 'make pot' in trunk.
>
> I think svn should simply be a copy of the final website, including binary
> translation .mo files in real locations. I'll create some bash scripts to
> site in htdocs/ to help create, update and merge the translations.

If we're going to "bake" the site via some script "compilation" step,
then can we get away from using PHP entirely?  :)


> I'll have to restructure the top level directories as well because we don't
> need these locale directories (the LC_MESSAGES ones) exposed via apache so it
> will mean adjusting some Apache settings to map the root directory of the
> website to a directory *beneath* htdocs. e.g. where the virtual host maps a
> DocumentRoot of htdocs, it will need to be modified to map htdocs/www

Also, does it then make sense to structure the files so that we can
configure Apache can do language content-negotiation?
<http://httpd.apache.org/docs/2.0/content-negotiation.html>  It would
basically mean a different structure; one like:

 htdocs/www/index.html.en
 htdocs/www/index.html.de
 htdocs/www/index.html.[...]
 htdocs/www/features.html.{en,de,...}


> To protect the .svn directories below www/, this snippet should be added to
> the virtual host config:
>         <Directory /opt/svn/htdocs/www/*/.svn>
>                 Order Deny,Allow
>                 deny from all
>         </Directory>

I suppose this, or maybe using `svn export` so the .svn files aren't
even created?

--
...jsled
http://asynchronous.org/ - `a=jsled; b=asynchronous.org; echo ${a}@${b}`
_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: htdocs and gettext

Neil Williams-2
On Monday 23 January 2006 5:04 pm, Josh Sled wrote:
> On Mon, 2006-01-23 at 16:33 +0000, Neil Williams wrote:
> > OK. I've got gettext working with PHP to translate a single website using
> > as many PO files as we can create.
>
> Awesome.

:-)  It's not ready to commit as a demo yet, but it does work locally.

> If we're going to "bake" the site via some script "compilation" step,
> then can we get away from using PHP entirely?  :)

Umm, no. If anything, PHP becomes a little more important as although the
TRANSLATIONS are handled via PO, the CONTENT strings now need to be wrapped
in php functions.

<h1><?php echo _("Welcome to GnuCash.org")?></h1>
instead of
<h1>Welcome to GnuCash.org</h1>

It means that those altering the original content need to be a little careful
in their edits (to retain the string and bracket markers). e.g. translatable
content that includes quote marks should use the SGML entity: &quot; It's
safer than expecting \" to be preserved through all translations.

We aren't "baking" or compiling the site in the way you infer. The content
strings remain more or less intact - so that there is only ever one
index.phtml - but the server loads the translated strings during the PHP run.
It's runtime translation, much as gettext does already.

All the preparation of the translations happens prior to the svn commit - svn
will contain the translated binaries, just like it contains the binary
images. I know this goes against the grain for those more used to automake.

> Also, does it then make sense to structure the files so that we can
> configure Apache can do language content-negotiation?

At present, I'm looking at browser/PHP language negotiation and direct link
fall back.
http://www.grep.be/data/accept-to-gettext.inc

> <http://httpd.apache.org/docs/2.0/content-negotiation.html>  It would
> basically mean a different structure; one like:
>
>  htdocs/www/index.html.en
>  htdocs/www/index.html.de
>  htdocs/www/index.html.[...]
>  htdocs/www/features.html.{en,de,...}

But gettext doesn't work that way - it would mean generating all those pages
manually at each svn commit. So far, the delay in loading the pages is not
significant. That structure would replace 12 .mo files with 144 translated
copies of the same page. Some of these pages are BIG!

It's the same mechanism, just implemented in the PHP instead of the server.
You still need the same fallback with the apache structure for those people
whose browsers don't contain the correct language variable. Both methods rely
on detecting what the browser supports - the PHP method uses one binary file
and one text file for each translation and a single content page for all
languages. The Apache method uses one page for every possible permutation of
language and content. AFAICT, PHP uses N+(L*2) where N is the number of
content pages and L the number of languages; Apache would seem to use N*L.

There are 234 phtml pages, 155 text files and 25 php scripts in the current
design. With 5 languages, the PHP method could mean 424 files; the apache
method (if I'm reading it right) would mean 2,070.

> > To protect the .svn directories below www/, this snippet should be added
> > to the virtual host config:
> >         <Directory /opt/svn/htdocs/www/*/.svn>
> >                 Order Deny,Allow
> >                 deny from all
> >         </Directory>
>
> I suppose this, or maybe using `svn export` so the .svn files aren't
> even created?

That's wasteful - there are lots of components of the new site that never need
to be replaced and deleting those (so that export won't complain) then
re-creating them is more than a little pointless. (Unless svn export has
solved the cvs problem of exporting to an existing directory.)

--

Neil Williams
=============
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: htdocs and gettext

Josh Sled
On Mon, 2006-01-23 at 17:45 +0000, Neil Williams wrote:
> Umm, no. If anything, PHP becomes a little more important as although the
> TRANSLATIONS are handled via PO, the CONTENT strings now need to be wrapped
> in php functions.

Oh, that's too bad.  I'd rather we were less coupled to PHP.  I don't
see why it's necessary for our web site, which is pretty much entirely
static.


> There are 234 phtml pages, 155 text files and 25 php scripts in the current
> design. With 5 languages, the PHP method could mean 424 files; the apache
> method (if I'm reading it right) would mean 2,070.

Well, I'm not sure why it's L*2 in one approach vs. L in the other...
and I think there'd be about the same number of source files (subtract
the php scripts but adding Makefiles)...

But who cares about a bunch of generated files?  Last I checked, disk
space was effectively free. :)  In any case, we can do all the work
once, or do it on each page request.  While it doesn't appear to be
slowing the site down, one option requires PHP and one doesn't.

--
...jsled
http://asynchronous.org/ - `a=jsled; b=asynchronous.org; echo ${a}@${b}`
_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: htdocs and gettext

Neil Williams-2
In reply to this post by Neil Williams-2
On Monday 23 January 2006 5:45 pm, Neil Williams wrote:
> On Monday 23 January 2006 5:04 pm, Josh Sled wrote:
> > On Mon, 2006-01-23 at 16:33 +0000, Neil Williams wrote:
> > > OK. I've got gettext working with PHP to translate a single website
> > > using as many PO files as we can create.
> >
> > Awesome.
> >
> :-)  It's not ready to commit as a demo yet, but it does work locally.

n.b. there are now two externals directories - one that is being read by
mail-search but which is redundant and one that is under the new www/ which
can be compressed to remove the en/ subdirectory.

Sorting out that is the next stage. (tips welcome!)

The rest of the plan is now:

1. Convert all remaining .phtml files to identify translatable strings.
     a) Each translatable string needs to be on a single line - PHP doesn't
have the concatenation behaviour of C where "x " "y" are added before being
passed to gettext. If in doubt, see www/index.phtml.
     b) Wherever possible, keep paragraphs together but avoid including too
many HTML tags in the translatable string, especially <a href=""> as the
quotes need to be escaped and may not survive translation.
     c) I've committed a few helper files which may be removed eventually.

2. News items will be put for translation by the news-script so don't do
anything with those.

3. Other languages are not automatically supported yet - particularly at my
demo site.

4. Only the index page is actually by the server at the moment, I need to
extend support by using a slightly modified gettext.php in each file.

If anyone feels like helping with the identification of translatable strings,
feel free to commit.
:-)

Translators should wait a little longer until more of the content is marked up
for gettext. However, if you want to make a start, please do.

--

Neil Williams
=============
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: htdocs and gettext

Neil Williams-2
In reply to this post by Josh Sled
On Monday 23 January 2006 6:16 pm, Josh Sled wrote:
> On Mon, 2006-01-23 at 17:45 +0000, Neil Williams wrote:
> > Umm, no. If anything, PHP becomes a little more important as although the
> > TRANSLATIONS are handled via PO, the CONTENT strings now need to be
> > wrapped in php functions.
>
> Oh, that's too bad.  I'd rather we were less coupled to PHP.

It's still useful, even though our content isn't as dynamic as a
database-driven site. e.g. the News items would be a royal PITA to handle via
SHTML or "baking" because the resulting index page would be a nightmare to
edit.

<asbestos>
I like PHP as much as you or Derek may like Emacs/Scheme so we aren't going to
see eye to eye on this one! Now if only Scheme was as easy as PHP . . . .
</asbestos>
:-)

We do have dynamic or at least changeable pages and scripts, there is some
automation and code reuse. It would be pointless to leave all that behind.

Right now, if someone commits a new News item, it would appear instantly.
That's useful.

> I don't
> see why it's necessary for our web site, which is pretty much entirely
> static.

and the reason it has to stay that way is ? ??? There are more dynamic things
we can do with the site once this is all done.

However, PHP is already worth using, IMHO, if only because of the inherent
code reusage.

CodeHelp has lots on PHP but there's only tiny database usage behind it (and
that only as a demo). A site doesn't have to have a database to benefit from
PHP - or to be termed "dynamic".

Maybe we could parse XML to make it anonymous (for bug reports), maybe we
could do lots of other things - just because gnucash.org has *been* a little
static until now doesn't mean it should stay that way.

--

Neil Williams
=============
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: htdocs and gettext

Derek Atkins
In reply to this post by Neil Williams-2
Neil Williams <[hidden email]> writes:

> To protect the .svn directories below www/, this snippet should be added to
> the virtual host config:
>         <Directory /opt/svn/htdocs/www/*/.svn>
>                 Order Deny,Allow
>                 deny from all
>         </Directory>

Why not just use a .htaccess for this?

-derek
--
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       [hidden email]                        PGP key available
_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: htdocs and gettext

Josh Sled
In reply to this post by Neil Williams-2
On Mon, 2006-01-23 at 18:50 +0000, Neil Williams wrote:
> It's still useful, even though our content isn't as dynamic as a
> database-driven site. e.g. the News items would be a royal PITA to handle via
> SHTML or "baking" because the resulting index page would be a nightmare to
> edit.

Why would it be a "nightmare"?  I'm suggesting running the processor
(PHP if you like) once, rather than during every page-load.  It's just
string-substitution...


> <asbestos>
> I like PHP as much as you or Derek may like Emacs/Scheme so we aren't going to
> see eye to eye on this one! Now if only Scheme was as easy as PHP . . . .
> </asbestos>
> :-)

... The difference being that I don't care if you don't use emacs, but
the web site is part of the project.

[And FTR, I've been very vocal in saying that I think we should
*eliminate* scheme from gnucash. So you might want to revise your
stereotype.  Frankly, I think we should remove scheme from gnucash for
the same reason as removing php from the website: they are Excess
Complexity.]


> and the reason it has to stay that way is ? ??? There are more dynamic things
> we can do with the site once this is all done.

Like what?

I've not heard anything suggested that would make me change my
mind.  As I look around other project's websites and think about the
last N years of both gnucash and every useful open-source project
website I've run across, while trying to find information, I'm pretty
convinced that:

- simple is better than complex.

- concise is better than simple.

- static is simpler than dynamic.


> Maybe we could parse XML to make it anonymous (for bug reports), maybe we
> could do lots of other things - just because gnucash.org has *been* a little
> static until now doesn't mean it should stay that way.

Or, maybe we don't do those things.  We don't need them, and they are
just one more moving part we can do without.  Or, when we do find that
we need them, we can use a scripting language for JUST that piece.

In any case, I think the website should be made (and subsequently stay)
much simpler than it is now.

--
...jsled
http://asynchronous.org/ - `a=jsled; b=asynchronous.org; echo ${a}@${b}`
_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Reply | Threaded
Open this post in threaded view
|

Re: htdocs and gettext

Neil Williams-2
On Tuesday 24 January 2006 4:19 am, Josh Sled wrote:
> > It's still useful, even though our content isn't as dynamic as a
> > database-driven site. e.g. the News items would be a royal PITA to handle
> > via SHTML or "baking" because the resulting index page would be a
> > nightmare to edit.
>
> Why would it be a "nightmare"?  I'm suggesting running the processor
> (PHP if you like) once, rather than during every page-load.  It's just
> string-substitution...

Depends on what we finally do with NEWS. If it's going to go with RSS then the
way we add news needs to change anyway.

TBH, you're proposing larger changes than I had anticipated. If most (all) of
the dynamic content is going to the wiki or simply /dev/null, then the rest
is up for grabs.

> [And FTR, I've been very vocal in saying that I think we should
> *eliminate* scheme from gnucash. So you might want to revise your
> stereotype.

Sorry.

> > Maybe we could parse XML to make it anonymous (for bug reports), maybe we
> > could do lots of other things - just because gnucash.org has *been* a
> > little static until now doesn't mean it should stay that way.
>
> Or, maybe we don't do those things.  We don't need them, and they are
> just one more moving part we can do without.  Or, when we do find that
> we need them, we can use a scripting language for JUST that piece.
>
> In any case, I think the website should be made (and subsequently stay)
> much simpler than it is now.
True (and with the scale of changes proposed on another thread, eminently
achievable) but I'm not convinced that the Apache method involving all that
duplication will actually simplify edits. I think it's easier to have one
site translated on-the-fly.

--

Neil Williams
=============
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: htdocs and gettext

Neil Williams-2
In reply to this post by Derek Atkins
On Monday 23 January 2006 10:57 pm, Derek Atkins wrote:
> Neil Williams <[hidden email]> writes:
> > To protect the .svn directories below www/, this snippet should be added
> > to the virtual host config:
> >         <Directory /opt/svn/htdocs/www/*/.svn>
> >                 Order Deny,Allow
> >                 deny from all
> >         </Directory>
>
> Why not just use a .htaccess for this?

Sorry for the delay, I did try .htaccess but <Directory> isn't supported
from .htaccess, only <Files> which, unfortunately, I couldn't get to work
with this type of wildcard.

If someone has a method using .htaccess, I'll use it.

--

Neil Williams
=============
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/


_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel

attachment0 (196 bytes) Download Attachment