HTTrack Website Copier
Free software offline browser - FORUM
Subject: HELP! I'm stuck
Author: JAG
Date: 05/02/2004 04:46
 
I have a problem tht doesn't have to do directly witht 
this program, but since I ran into it using this program I 
thought I might ask. 

I ran into the problem using your hts-cache/new.txt file. 
I was using this file to scan all the links on a page and 
list them. I would then open this file and go to column I 
which lists all the URLs scanned.

Then I would save the new.txt file as a .csv file and open 
it into a spreadsheet and cut and paste the column with 
the URLs in it into it's own txt file. (pretty much just 
extract the URL out of your log file.)

well that's where I ran into a problem. On one of my 
search and scans the URLs end up being too long for a csv 
file. It cuts the end of the address off.

Here is an example of an address...

<http://patimg1.uspto.gov/.piw?Docid=06686531&homeurl=http>%
3A%2F%2Fpatft.uspto.gov%2Fnetacgi%2Fnph-Parser%3FSect1%
3DPTO2%2526Sect2%3DHITOFF%2526u%3D%2Fnetahtml%2Fsearch-
adv.htm%2526r%3D5%2526f%3DG%2526l%3D50%2526d%3DPTXT%2526p%
3D1%2526p%3D1%2526S1%3D(((('electric%252Bbass'%252BOR%
252B'electric%252Bbasses')%252BOR%252B'bass%252Bguitar')%
252BOR%252B'bass%252Bguitars')%252BAND%252B(84%2F$.CIOR.%
252Bor%252B84%2F$.CIXR.%252Bor%252B84%2F$.CIUX,CIDX.))%
2526OS%3D%252B(%252522electric%252Bbass%252522%252Bor%252B%
252522electric%252Bbasses%252522%252Bor%252B%252522bass%
252Bguitar%252522%252Bor%252B%252522bass%252Bguitars%
252522)%252Band%252Bccl%2F84%2F$%2526RS%3D((((%
252522electric%252Bbass%252522%252BOR%252B%252522electric%
252Bbasses%252522)%252BOR%252B%252522bass%252Bguitar%
252522)%252BOR%252B%252522bass%252Bguitars%252522)%252BAND%
252BCCL%2F84%2F$)
&PageNum=&Rtype=&SectionNum=&idkey=3C3DDBF631F6

Part of the reason the address is so long is because it 
contains a HOMEURL field which contains all the parameters 
of the original query. If I take out the homeurl field the 
address still works but looks like this.

<http://patimg1.uspto.gov/.piw?Docid=06686531&PageNum=&Rtype=&SectionNum=&idkey=3C3DDBF631>
F6

Ok so if I could find a program that takes a list of URLs 
in text format and truncates out the homeURL field, I 
could really use it.

I COULD do it manually but the list I have is over 2000 
URLs long and it is only one of many searches. If I did 
this manually it would take days.

Can anyone help? I usually search the web for this kind of 
stuff but I don't even know where to look. Point me in the 
right direction please.

Joe
 
Reply


All articles

Subject Author Date
HELP! I'm stuck

05/02/2004 04:46




4

Created with FORUM 2.0.11