
|
84799 packages online
|
|
 |
|
|
|
No screenshot available
|
Short: | V1.02 Extract URL\'s from any file+sort++ |
Author: | frans xfilesystem.freeserve.co.uk (francis swift) |
Uploader: | frans xfilesystem freeserve co uk (francis swift) |
Type: | comm/www |
Architecture: | m68k-amigaos |
Date: | 1999-09-06 |
Replaces: | urlx.lha |
Download: | comm/www/urlx.lha - View contents | Readme: | comm/www/urlx.readme |
Downloads: | 719 |
|
Some quick'n'nasty hacks, but I've included the source for you to look
at, especially as urlx uses btree routines and there aren't that many
simple examples of using btrees.
The btree routines used are by Christopher R. Hertel and are available
in full on the Aminet as BinaryTrees.lzh in dev/c.
V1.02
-----
Some bugfixes/improvements in scanv, plus new template option in urlx,
for which I've included an example template file for one particular
version of Voyager. Use something like
urlx -p -a -u -t temp_voyager infile Bookmarks.html
to get an html bookmarks file.
V1.01
-----
Added functionality to scanv to enable it to be used instead of treecat
for Voyager cache only. This is to eliminate some of the bogus url's
that would be thrown up by the previous method (below) using treecat|urlx.
The new method for scanning the Voyager cache (from sh/pdksh) is eg
scanv -c dh0:Voyager/cache | urlx -p -u - outfile
which uses the new -c flag to cat (output) the contents of each file
which are then piped through urlx for processing. Of course, treecat is
still necessary for other caches eg AWeb and Netscape.
urlx
----
This program searches a file for url's (http:// etc) and prints them
or outputs them to a file. Internally it stores them in a btree to
allow duplicates to be eliminated and optionally to allow the output
to be sorted. There are various options:
-s selects a simple alphabetic sort for the output
-u selects a special url sort that should provide better grouping
of similar site names (basically it sorts the first url element
in groups backwards)
-h select html output format for making quick bookmark files,
instead of the default straight text output
-t <file> use a template file for output formatting
-p retain parameters after url's, by default these are ignored
-a allow accented characters in url's (i.e. characters > 127).
-.<ext> select just files with extension .<ext>, for example to show
only .jpg url's you would use -.jpg, and for .html you would
use -.htm (which matches both .htm and .html)
-i a special file selection option which tries to intelligently
select only url's that are likely to be html's, both by using
the extension and by examining the path
Basically there are lots of options but you'll probably just end up using:
urlx -u infile outfile
which uses the special url sort, or
urlx -u -h infile outfile.html
for making a bookmark file.
In both above examples you might want to use -p to retain parameters,
(the bits after the question marks, eg http://yes.or.no?wintel=crap).
treecat
-------
This is just a quick hack to let shell (sh/pdksh) users grab url's from
a complete directory tree. urlx accepts a single dash as meaning input
is from stdin, so you can use something like
treecat cachedirectorypath | urlx -u - outfilename
to produce a file containing every url in every file in your cache.
You can use this on any browser cache tree.
scanv
-----
This is used specifically to pick out the url's from the headers on the files
in a voyager cache. This is just the url of the file itself, the contents are
by default not examined.
NEW (1.01): -c flag to cat (output) contents of file for piping to urlx.
urlv
----
This is used specifically to grab url's from a Voyager history file, usually
called URL-History.1.
urla
----
This is used specifically to grab url's from an AWeb cache index file,
usually called AWCR.
stricmp_test
------------
Just a quick test prog to see which order the compiler (libc really) sorts
strings in stricmp calls. Different compilers use different orders :-(
|
Contents of comm/www/urlx.lha
PERMSSN UID GID PACKED SIZE RATIO CRC STAMP NAME
---------- ----------- ------- ------- ------ ---------- ------------ -------------
[generic] 6374 10692 59.6% -lh5- 57c8 Sep 3 1999 urlx/bin/scanv
[generic] 4847 8476 57.2% -lh5- 2d87 Aug 29 1999 urlx/bin/stricmp_test
[generic] 5994 10068 59.5% -lh5- 686a Aug 26 1999 urlx/bin/treecat
[generic] 5455 9144 59.7% -lh5- 9ccc Aug 29 1999 urlx/bin/urla
[generic] 5914 9820 60.2% -lh5- ea23 Sep 1 1999 urlx/bin/urlv
[generic] 8778 15076 58.2% -lh5- 302a Sep 3 1999 urlx/bin/urlx
[generic] 497 1824 27.2% -lh5- 08b0 Aug 29 1999 urlx/Makefile
[generic] 1370 3697 37.1% -lh5- fad4 Sep 3 1999 urlx/scanv.c
[generic] 324 1318 24.6% -lh5- 0b80 Aug 25 1999 urlx/stricmp_test.c
[generic] 206 296 69.6% -lh5- 9791 Sep 3 1999 urlx/temp_voyager
[generic] 812 1984 40.9% -lh5- 49b3 Aug 26 1999 urlx/treecat.c
[generic] 11348 42918 26.4% -lh5- 7439 Jul 26 1997 urlx/ubi_BinTree.c
[generic] 9193 35348 26.0% -lh5- c7dd Jul 26 1997 urlx/ubi_BinTree.h
[generic] 1018 2480 41.0% -lh5- 6a26 Aug 29 1999 urlx/urla.c
[generic] 1496 3657 40.9% -lh5- 0ee5 Sep 1 1999 urlx/urlv.c
[generic] 3716 12723 29.2% -lh5- b651 Sep 3 1999 urlx/urlx.c
[generic] 1789 3949 45.3% -lh5- 8eb3 Sep 3 1999 urlx/urlx.readme
---------- ----------- ------- ------- ------ ---------- ------------ -------------
Total 17 files 69131 173470 39.9% Sep 6 1999
|
|
|
 |
Page generated in 0.02 seconds |
Aminet © 1992-2024 Urban
Müller and the Aminet team.
Aminet contact address: <aminet aminet net> |