START, then click on Programs,
then click on Accessories, then click on Notepad.
Notepad is also available on Windows 3.x.
In Notepad, a simple ASCII file has a name ending with .txt.
HYPERTEXT MARKUP LANGUAGE (HTML)
is virtually the universal language
of webpages on the worldwide web. A simple HTML page consists
of ordinary ASCII text, sandwiched between a standard header
and standard trailer.
HOW DO I MAKE A SIMPLE ASCII FILE?
In Windows 95, 98, or NT, click on START,
then click on Programs,
then click on Accessories, then click on Notepad.
Notepad is also available on Windows 3.x.
In Notepad, a simple ASCII file has a name ending with .txt.
HOW DO I MAKE A SIMPLE HTML FILE?
Make any vanilla (seven-bit) ASCII text-file,
using Windows Notepad, or some other suitable word-processor.
Then insert it into the following boiler-plate:
<html><head><title>
PLACE YOUR TITLE HERE</title></head>
<body>PLACE YOUR MAIN TEXT HERE
<br><br><br></body></html>
6. HYPERTEXT MARKUP LANGUAGE.
NEXT PAGE.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.
HYPERTEXT MARKUP LANGUAGE (HTML) IS NOT A PROGRAMMING LANGUAGE.
For all intents and purposes, HTML is a typesetting language.
On the worldwide web, HTML issues the formatting instructions
for text information (also: image, audio, and cinematic video)
that appears on the web page.
If you want programming on the worldwide web,
then you have two choices: programming on the server;
or programming on the client.
The Internet Worldwide Web (WWW)
may be viewed as a collection of WEBPAGES.
You can read my introductory guide to the WWW at URL:
http://www.medparse.com/whatnett.htm
The WWW is sponsored by the WORLDWIDE WEB CONSORTIUM,
a non-profit, international organization, described at URL:
http://www.w3.org/
A STATIC WEBPAGE is a collection of text, images, and audio,
that is located at a UNIVERSAL RESOURCE LOCATOR (URL).
A static webpage provides information, but does not perform
calculations or alter data. However, a static webpage
can LAUNCH a computer program at your website.
For example, the static webpage that describes
the short-sentence parser that we will build together is located at URL:
http://www.medparse.com/
Most static webpages on the worldwide web are written
in a MARKUP LANGUAGE, which controls such features
as font style, font size, color, placement of images,
and foreign language characters. The most popular markup language
is HYPERTEXT MARKUP LANGUAGE (HTML),
which is a scaled-down (simplified) version
of the Standard Generalized Markup Language, described at URL:
http://www.w3.org/MarkUp/SGML.html
A DYNAMIC WEBPAGE is created on-the-fly by a
computer program, or COMPUTER SCRIPT.
The highway between an HTML page and a script is the
COMMON GATEWAY INTERFACE (CGI).
If you click on the SUBMIT button for the
www.medparse.com home webpage,
then a computer program will calculate a parse
for the input sentence, using a script written in
PRACTICAL EXTRACTION AND REPORTING LANGUAGE (PERL).
There are many other scripting languages besides Perl,
including C, M, Python, Java. The advantages of Perl are:
cost-free, easy-to-learn, very powerful.
When you have your FTP access set up,
I will send you a Perl module.
Most webpage names begin with the prefix "www", but as you see,
the owner of the website has the option to use a different prefix.
Every HOME WEBPAGE has an up-to-twelve-digit-number,
supplied by the INTERNET SERVICE PROVIDER, ISP.
The ISP number for www.medparse.com is: xxx.xxx.xxx.xxx.
You can reach any URL through its ISP number, as for example:
http:// xxx.xxx.xxx.xxx
An international registry of ISP numbers
and URL names is maintained by the company, NETWORK SOLUTIONS, INC.,
which was originally sponsored by the U. S. National Science Foundation.
On your file-transfer-protocol (FTP) software, this ISP number
is also known as the HOSTNAME. The USERID is: turk.
For security reasons, I will send you the password in a separate mailing.
This is an Internet site that you and I can share in our work together.
We will be as close as if we lived in the same city!
As soon as you have your FTP capability in place,
you can start depositing files at the website.
The home webpage file is named: index.htm .
I have another webpage, a brief description of medical ontologies,
in webpage file: whatonto.htm . You can reach this file at URL:
http://www.medparse.com/whatonto.htm
You can also read my introductory guide to the WWW at URL:
http://www.medparse.com/whatnett.htm
You should become familiar with the basics of writing RAW HTML,
in a text-editor such as NOTEPAD.
The reason for learning this, is that Internet web programming
works by an exchange of HTML files.
The initial HTML page that you see on the screen contains a SUBMIT button,
which launches a command-sequence to a program in the container, or BIN,
for COMMON GATEWAY INTERFACE programs, or CGI-BIN.
When you click on the SUBMIT button, a command-sequence launches
a CGI program, which, in turn, returns a new HTML page to your browser.
You can write a simple PERL program after an afternoon of study.
The simplest HTML page is the following blank page:
<html> <body> </body> </html>
The PERL command for writing this HTML page
back to your browser is:
print " <html> <body> </body> </html> ";
The entire PERL program for writing this HTML page back to your browser is:
#!/usr/bin/perl
print "Content-type: text/html\n\n";
print " <html> <body> </body> </html> ";
exit;
The next step in writing an HTML page is to give it some content.
The HTML page may be divided into a HEAD and BODY.
Within the HEAD is the TITLE. Thus:
<html> <head> <title> PLACE TITLE HERE </title> </head>
<body> PLACE CONTENT HERE </body> </html>
There a few tips for composing your HTML page:
LINE BREAK IS: <br>
SOLID LINE IS: <hr>
BULLET IS: <li>
BOLDFACE IS: <b> FOLLOWED BY: </b>
ITALIC IS: <i> FOLLOWED BY: </i>
UNDERLINE IS: <u> FOLLOWED BY: </u>
STRIKETHROUGH IS: <s> FOLLOWED BY: </s>
You should find a good HTML book to get the details.
Borrowing HTML code from other webpages is the best way
to add interesting new tricks to your HTML page.
That is, surf the net, find some stuff that you like,
DISPLAY THE HTML SOURCE CODE on your browser,
and borrow the relevant code.
Just so that you don't break any copyrights
or offend anybody, you should disguise your own
code enough so that its source is not immediately recognizable.
If you are a real gentleman and scholar,
then you could make a hyperlink to the source
where you got the interesting coding idea.
An HTML page is a static entity.
In order for you to dynamically collect information
from a web-user, and return an individualized response,
the web-user must launch a cgi-program from the initial HTML page,
and receive a programmed response, based upon the command-sequence
launched by the web-user, and resulting in a new HTML page composed
on-the-fly by the cgi-program.
For example, the following HTML program
launches a command-sequence to the program PERLHLLO.CGI
in the cgi-bin subdirectory of the account.
<html> <body>
<form name="sender"
method="get"
action="http://www.medparse.com/cgi-bin/perlhllo.cgi">
<br><input type="submit" name="bx" value="SUBMIT">
<br>
</body> </html>
The job for program PERLHLLO.CGI is to intercept this command sequence
and return a customized HTML page back to the web-user.
7. FILE TRANSFER PROTOCOL.
NEXT PAGE.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.
All necessary files for FTP and PERL
are stored in BINARY mode in the \cgi-bin subdirectory.
I suggest that you make a parallel C:\cgi-bin
subdirectory in your personal computer hard drive,
and transfer the following PERL and WS_FTP files in BINARY MODE:
PERL.EXE
PERL100.DLL
PERLGLOB.EXE
PERLHLLO.CGI
WS_FTP.EXE
WS_FTP.EXT
WS_FTP.INI
WS_FTP.HLP
WS_FTP.TXT
The executable files (PERL.EXE and WS_FTP.EXE)
only function in an MS-DOS directory,
with WINDOWS 95 or WINDOWS 98 running.
(WINDOWS 3.x is does not have all the requisite DLL files.)
The four PERL*.* files allow you
to execute PERL5 programs in MS-DOS.
The TURK account has its own PERL5 interpreter,
that executes PERL5 programs in its UNIX environment.
The five WS_FTP files comprise a file-transfer program.
As a first step, download these files into
an MS-DOS directory, I suggest c:\turk
with WINDOWS 95 or WINDOWS 98 running.
THE FIRST THING THAT YOU ABSOLUTELY NEED TO UNDERSTAND,
and which will drive you crazy if you
don't keep track of correctly,
is that the LINE DELIMITER in MS-DOS
(i.e., the operating system of your personal computer)
is CARRIAGERETURNLINEFEED, i.e., ASCII 13 followed by ASCII 10;
whereas the line delimiter in UNIX
(i.e., the operating system of the
ISP account, and most other inexpensive Internet accounts)
is LINEFEED ONLY, i.e., ASCII 10 only.
Whenever you transfer a file between
MS-DOS and UNIX, you MUST pay attention to this difference.
You have two options: you can either transfer files
in ASCII MODE or in BINARY MODE.
In BINARY MODE, the file is transferred in either direction
byte-by-byte, with absolutely no modification.
In ASCII MODE, the file is modified during
transfer, as follows: from MS-DOS to UNIX,
the extraneous CARRIAGERETURNs are stripped;
from UNIX to MS-DOS, each LINEFEED in UNIX is
supplemented with a preceding CARRIAGERETURN.
If you make one of these file-transfers incorrectly,
the resulting file will be unreadable or unexecutable or both.
Since UNIX preceded MS-DOS historically,
and is more widely available to the world at large
through the Internet, Microsoft is to blame
for introducing this fundamental incompatibility
in his MS-DOS operating system.
However, we are now stuck with this incompatibility forever.
I keep track of my file-transfers as follows:
all *.HTM and *.CGI files are transferred in ASCII MODE.
all other files are transferred in BINARY MODE.
Furthermore, I name all my files with mainname at most
eight characters, extensionname exactly three letters, all lower case.
This way, there are no name incompatibilities between operating systems.
I never deviate from these rules, so I never forget what I'm doing.
8. PERL.
NEXT PAGE.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.
PERL (PRACTICAL EXCHANGE AND REPORTING LANGUAGE)
is the hottest programming language on the Internet.
It is easy to learn a little bit of PERL,
once have launched your first program.
PERL is the best scripting-language for handling text-strings,
text-files, and for collecting text-information from websites.
The most important property of a PERL script is that it
can be launched from one HTML page
and return a response in the form of another HTML page.
The most efficient strategy for writing PERL programs
is to debug the PERL program in your local MS-DOS
environment, and then move the program into
the CGI-BIN subdirectory for final testing.
DON'T FORGET: YOU SHOULD ALWAYS MOVE PERL/CGI PROGRAMS
IN ASCII MODE.
I suggest writing the following program in NOTEPAD
in your MS-DOS environment, as file PERLHLLO.CGI
#!/usr/bin/perl
print "Content-type: text/html\n\n";
print " <html> <body> </body> </html> ";
print " <html> <head> <title> HELLO, WORLD. ";
print " </title> </head> ";
print " <body> HELLO, WORLD. <br></body></html> ";
exit;
You may test program PERLHLLO.CGI in the c:\cgi-bin directory
in your MS-DOS environment, with the command:
perl PERLHLLO.CGI
The program will print an HTML-ready file as follows:
<html><head><title> Hello, World. </title></head>
<body><center> Hello, Dr. World. </center><br></body></html>
The next step is to move program PERLHLLO.CGI
into the CGI-BIN subdirectory on the Internet for final testing.
I will tell you how to do it with WS_FTP,
although any File Transfer Protocol (FTP) program
is OK. However, I only know how to launch UNIX security commands in WS_FTP,
so if you use a different FTP program,
you'll have to figure out how to do this yourself.
<html><head><title> Launch Program. </title></head>
<body><center> Launch Program.
<form name="sender" method="get"
action="http://www.medparse.com/cgi-bin/perlhllo.cgi">
<input type="submit" name="bx" value="SUBMIT"></form>
</center><br></body></html>
The same result is obtained at URL:
http://www.medparse.com/cgi-bin/perlhllo.cgi
The WORLDWIDE WEB (WWW)
consists of a decentralized collection of
STATIC WEBPAGES,
each residing on a SERVER,
supported by an
INTERNET SERVICE PROVIDER (ISP).
Ordinary users may rent basic services
from an ISP in the Baltimore/Washington area
for as little as $15 per month.
Each webpage has a unique name, or UNIVERSAL RESOURCE LOCATOR (URL),
consisting of numbers and letters and some punctuation marks.
A single webpage displays a fixed body of textual, image,
and audio information. In this discussion, the webpage is written
in the HYPERTEXT MARKUP LANGUAGE (HTML).
Within very narrow limits, a static webpage does not process information
entered by the user.
The user may seek additional information by
LINKING to antoehr static webpage.
Alternatively,
the user may enter data or commands into a static webpage,
and LAUNCH these data, typically by clicking a SUBMIT BUTTON.
The launch data are sebt across the
COMMON GATEWAY INTERFACE (CGI)
to a computer program, or COMPUTER SCRIPT.
The computer script processes the launch data,
and returns a new webpage, written in HTML,
to the orignal user.
For the discussion that follows,
all computer scripts are written in the
PRACTICAL EXTRACTION AND REPORTING LANGUAGE (PERL).
Thus, the entire functionality of the worldwide web
can be viewed as static webpages which either link to one another,
or else launch data that create new webpages.
This worldwide, interlocking system of information packages
was first envisioned by VANNEVAR BUSH,
a U. S. scientist and futurist, active in the 1940s
at the dawn of the computer age.
This report will be confined to webpages written in HTML
and computer scripts written in PERL.
We will walk through a set of instructional HTML pages
and PERL scripts which may be visited directly in this email.
For each example, say perlxxxx, the webpage may be viewe
by clicking on URL:
http://www.medparse.com/cgi-bin/perlxxxx.cgi
You may examine the HTML document by clicking VIEW SOURCE DOCUMENT
on your browser. You may view the corresponding PERL script
at URL:
http://www.medparse.com/perlxxxx.cgi
The following instructional PERL scripts are available.
perlxxxx.cgi
perlnull.cgi
perlread.cgi
perlwrit.cgi
perljoin.cgi
perlsplt.cgi
perlrado.cgi
perltext.cgi
perlchkb.cgi
perlwhil.cgi
perlasci.cgi
perlasoc.cgi
perlaray.cgi
perltime.cgi
perlescp.cgi
Each example-script is designed to illustrate
a particular feature of HTML or PERL.
An HTML file consists of text, images, and audio,
whose size, positions, fonts, colors, etc.,
are delineated by TAGS.
Typically, these tags are PAIRED,
i.e., the functionality begins with a START-TAG
and concludes with an END-TAG.
A start-tag consists of <...>
where ... is the TAGNAME.
An end-tag consists of a matching </...>
Some tags are SINGLE.
The simplest HTML file is:
<html>...</html>
This file consists of the HTML START-TAG
and the HTML END-TAG.
Typically, an HTML file contains a
HEAD and a BODY.
The head contains the title,
optional indexing information,
and sometimes a small program that controls
graphical functions on the page.
The BODY contains the main textual and image information
of the document.
Thus, most HTML documents have the following stereotypical structure:
<html>
<head>
<title>
PLACE YOUR TITLE HERE.
</title>
</head>
<body>
PLACE YOUR TEXT HERE.
</body>
</html>
The entire PERL script that generates this HTML page is:
#!/usr/bin/perl
print "Content-type: text/html\n\n";
print " <html> <head> "
print " <title> PLACE YOUR TITLE HERE. </title> </head> "
print " <body> PLACE YOUR TEXT HERE. </body> </html> "
exit;
You may display this PERL script by clicking on URL:
http://www.medparse.com/cgi-bin/perlxxxx.cgi
Last Updated: October 17, 2001, by G. William Moore, MD, PhD.