HTTrack Website Copier
Free software offline browser - FORUM
Subject: CSS file only partially parsed
Author: Cédric
Date: 12/11/2019 16:53
 
Hi,

I'm trying to build a local mirror of some articles from WikiMini, with that
command line to fetch one article with depth 3:

httrack <https://fr.wikimini.org/wiki/Barbapapa> -O mirrorwikimini --depth=3
--near --can-go-up-and-down --replace-external --user-agent "Mozilla/5.0
(Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1"
"-*wiki*/**[name]:*" "-*wiki*/w"

The local mirror doesn't look too bad, but I discovered something that puzzles
me: the main CSS file of the article is correctly detected, correctly
downloaded, and parsed for embedded links (CSS url(...) occurrences)... but
only to a certain point.

If you try the httrack command given above, the main CSS file will be stored
locally at fr.wikimini.org\w\loade39c.css

It is a big minified CSS file (186 KB), containing only one very long line.

If you look more closely in the file, you'll see that all references to
url(...) are correctly replaced with their local counterpart (and the matching
file is indeed present in the local mirror)... but only up to position
(column) 23021 in the file.

None of the url(...)'s past that point are replaced with a local URL, and the
matching file has not been downloaded in the local mirror. For example, the
url(...) at position 23296 does not work because of that.

You can see this in the local mirror in your browser, where the main mouse
cursor is the same as on the online version of the page, BUT other cursors
(such as the "link" and "edit" cursors) do not work on the local version. That
is because only one cursor (url(<file>.cur)) from the CSS file has been parsed
and downloaded. The other ones, which appear further in the CSS file, are
missing.

I don't understand why this happens.

I've tried reading extra and debug logs, but I don't see anything wrong,
except that the missing URLs are not mentioned at all there. It seems the
downloads are not even attempted.

Is there a file size limit to the parser ? Or a line length limit, after which
it will stop parsing altogether ?
I tried some unrelated options such as --structure, --priority, and some
others, but it did not improve the mirror. I also tried to change the target
directory for the mirror, but did not see any improvement there either.

Any ideas ?
Thanks.
 
Reply


All articles

Subject Author Date
CSS file only partially parsed

12/11/2019 16:53




1

Created with FORUM 2.0.11