Opened 16 years ago
Last modified 11 years ago
#612 new defect
g.html2man: parsing leads to man page errors
Reported by: | hamish | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 6.5.0 |
Component: | Docs | Version: | svn-develbranch6 |
Keywords: | g.html2man, utf8 | Cc: | |
CPU: | All | Platform: | Unspecified |
Description
Hi,
tools/g.html2man has a number of parsing problems.
there are a few like cairodriver.1 which happen to start lines with ".", which gets parsed incorrectly by the man program. e.g. in cairodriver it lists ouput formats, and the '.pn' of .png gets hijacked and all those image types end up missing from the resulting man page.
another popular one is <OL><LI> becoming ..IP instead of .IP (e.g. pngdriver.1 just after "Example")
and yet another is g.parser.1 where #%multiple: gets eaten.
detailed list of errors is here: (scroll down to 'grass-doc') http://lintian.debian.org/maintainer/pkg-grass-devel@lists.alioth.debian.org.html#grass
Hamish
Change History (11)
comment:1 by , 16 years ago
follow-up: 3 comment:2 by , 16 years ago
Also there is a zillion cases of -flag and --flag using '-' so interpreted as a hyphen not a minus sign. i.e. '-' must be quoted as '\-'. See:
http://lintian.debian.org/tags/hyphen-used-as-minus-sign.html
I looked, but I've got no idea how to backport this stuff to the perl version. does the perl version still need to be there in trunk?
Hamish
follow-up: 4 comment:3 by , 16 years ago
Replying to hamish:
Also there is a zillion cases of -flag and --flag using '-' so interpreted as a hyphen not a minus sign. i.e. '-' must be quoted as '\-'. See:
http://lintian.debian.org/tags/hyphen-used-as-minus-sign.html
How is the script supposed to determine whether a '-' in the HTML is a minus or a hyphen? For now, I've changed it to convert all occurrences of '-' to '\-'.
I looked, but I've got no idea how to backport this stuff to the perl version. does the perl version still need to be there in trunk?
No.
comment:4 by , 16 years ago
Replying to glynn:
How is the script supposed to determine whether a '-' in the HTML is a minus or a hyphen?
fwiw, lintian's perl detection goes like: http://ftp.de.debian.org/debian/pool/main/l/lintian/lintian_2.2.10.tar.gz
# Catch hyphens used as minus signs by looking for ones at the # beginning of a word, but don't generate false positives on \s-1 # (small font), \*(-- (pod2man long dash), or things like \h'-1'. if ($line =~ /^( ([^\.].*)? [\s\'\"\`\(\[] (?<! \\s | \*\( | \(- | \w\' ) )? (--?\w+)/ox) {
For now, I've changed it to convert all occurrences of '-' to '\-'.
ok; cosmetic rendering errors are better than syntax ones I guess.
cheers, Hamish
follow-up: 6 comment:5 by , 16 years ago
some fixes for non-module help pages (were causing 'mandb -c' whatis errors) in devbr6 and trunk in r37877, ..
hope for testing feedback before backporting to 6.4.
Hamish
follow-up: 7 comment:6 by , 16 years ago
Replying to hamish:
some fixes for non-module help pages (were causing 'mandb -c' whatis errors) in devbr6 and trunk in r37877, ..
The changes are meaningless; g.html2man.py discards all comments.
To get a suitable whatis entry, the HTML file needs to include a <h2>NAME</h2>
section containing the module name followed by a dash then the description. This is added automatically by --html-description, but non-module pages will need to have it added manually.
follow-up: 8 comment:7 by , 16 years ago
Replying to glynn:
The changes are meaningless;
a few qualifiers on that are appropriate: a) currently; b) just for the python version in gr7. (the perl version in all Gr versions now knows about it)
g.html2man.py discards all comments.
the solution I used in the perl version was to check for that meta tag before the comment stripping code.
To get a suitable whatis entry, the HTML file needs to include a
<h2>NAME</h2>
section containing the module name followed by a dash then the description. This is added automatically by --html-description, but non-module pages will need to have it added manually.
yeah, I look at doing that first. But the <H2>NAME really wasn't appropriate for the intro and driver custom HTML pages I looked at and so I went with the meta-tag solution.
I couldn't see how to make that work with the python version (does HTMLParser.py strip out the comments before we can get our hands on them?), and so I left it for now.
Hamish
comment:8 by , 16 years ago
Replying to hamish:
yeah, I look at doing that first. But the <H2>NAME really wasn't appropriate for the intro and driver custom HTML pages I looked at
I'm not so sure.
and so I went with the meta-tag solution.
Actual <meta> tags would be a reasonable solution for any files which genuinely shouldn't have a NAME section, e.g.
<meta name="name" content="grass-dbf" scheme="GRASS">
I couldn't see how to make that work with the python version (does HTMLParser.py strip out the comments before we can get our hands on them?), and so I left it for now.
It's possible to add a handler for comments, but I don't consider this appropriate.
Comments are comments; you are supposed to be able to use them as you wish, without any consequences. The only situation where it's appropriate for an application to take note of comments in its input is if it intends to include them as comments in its output.
comment:9 by , 12 years ago
Milestone: | 6.5.0 → 7.0.0 |
---|---|
Version: | 6.4.0 RCs → svn-trunk |
(since related to g.html2man.py, bumping to trunk)
comment:10 by , 12 years ago
Milestone: | 7.0.0 → 6.5.0 |
---|---|
Version: | svn-trunk → svn-develbranch6 |
Actually the situation (rewritten) .py version is better, this bug has to do with the Perl version in 6.x.
see
http://lintian.debian.org/maintainer/pkg-grass-devel@lists.alioth.debian.org.html#grass
scroll down to the "grass-doc" package and see the many warnings about
manpage-has-bad-whatis-entry
andmanpage-has-errors-from-man
.
Hamish
comment:11 by , 11 years ago
Keywords: | utf8 added |
---|
(G6.x only)
re. man
treating flag names as hyphens and breaking them for cut & paste when utf8 is used, here's some post-processing sed regex to catch many of them:
sed -i -e 's/\([ ([]\)-\([a-z]\)/\1\\-\2/g' \ -e 's/\([ []\)--\([a-z]\)/\1\\-\\-\2/g' \ -e 's/\[-\\fB/[\\-\\fB/' \ -e 's/\[--\\fB/[\\-\\-\\fB/g' \ -e 's/"\\fB-\([a-zA-Z0-9]\)/"\\fB\\-\1/' \ -e 's/"\\fB--\([a-zA-Z0-9]\)/"\\fB\\-\\-\1/' \ "$man_page"
Hamish
Replying to hamish:
I've committed some fixes in r37386. Apart from escaping dots and single quotes at the beginning of a line, it doesn't remove leading whitespace from pre-formatted text and doesn't insert line breaks within .IP "..." (this last one only affected d.graph).
I can't reproduce these.
Note that the "bad whatis" entries correspond to an HTML file which lacks a description in the NAME section. This generally only occurs with HTML files which aren't generated from --html-description.