by Wayne Berry
Common Gateway Interface (CGI) was developed so that a Web browser could pass parameters to a Web server, regardless of the platform on which either of the machines is running. For instance, if the Web browser is running on a Macintosh and the server is running on a UNIX machine, the browser could pass parameters that both machines can understand. CGI scripts is a common name for programs that run on the server side. The name scripts comes from the fact that in the beginning, most server-side implementations were Perl scripts running on a UNIX box. CGI scripts do not have to be Perl scripts; they can be any program, scripted or compiled, that the Web server is allowed to execute.
The parameters to the CGI scripts usually originate in a <FORM></FORM> tag within the current page that the browser is displaying. Encapsulated within the <FORM> tag are <INPUT> tags that create 'name = value' parameters. There are also other HTML tags that create 'name = value' parameter pairs. When the form is submitted to the server, the browser encodes the parameters into the CGI standard. The browser takes all the parameters and puts them into a single string without spaces. The browser replaces the spaces with a + symbol and converts all other symbols to hexadecimal values preceded with a % symbol. The first half of the string is the location of the URL on the server that contains the location of the CGI script. The second half of the string is 'name = value' pairs. The two halves are separated by a ? symbol and the 'name = value' pairs are separated by a & symbol.
Table 13.1 is an example of the HTML Input
Here are several examples of HTML text that can be used to pass parameters to a CGI Script.
If the browser is reading a single input form page that contains these tags:
<FORM ACTION="http://www.myserver.com/scripts/mycgi.exe"> <INPUT TYPE=HIDDEN NAME="Item" VALUE="Oranges"> <INPUT TYPE=SUBMIT> </FORM>
the server would receive a request of
http:\\www.myserver.com\scripts\mycgi.exe?Item=Oranges
If the browser is reading a multiple input form page that contains these tags
<FORM ACTION="http://www.myserver.com/scripts/mycgi.exe"> <INPUT TYPE=HIDDEN NAME="Item" VALUE="Oranges"> <INPUT TYPE=HIDDEN NAME="Price" VALUE="2.00"> <INPUT TYPE=SUBMIT> </FORM>
the server would receive a request of
http://www.myserver.com/scripts/mycgi.exe?Item=Oranges&Price=2.00
Hidden INPUT types like those in the preceding are not shown to the user by the browser. More interesting INPUT types and other tags allow user interaction. If the browser is reading a page that contains these tags (where the INPUT is associated with a text box, and the SELECT is associated with a drop-down list containing two choices: 1.00 or 2.00), you would have code something like this:
<FORM ACTION="http://www.myserver.com/scripts/mycgi.exe"> <INPUT TYPE=TEXT NAME="Item"> <SELECT NAME="Price"'> <OPTION>1.00 <OPTION>2.00 </SELECT> <INPUT TYPE=SUBMIT> </FORM>
When the user selects the link, the parameters will be sent to mycgi.exe just like the multiple input form's response..
On receiving a request from a browser, the Web server interrupts the request and decides if it is a request for a CGI script or a static page, such as an HTML text file. If the request is for a CGI script, the server passes the parameters that are referenced in the URL to the CGI script. The CGI script decodes the parameters and uses them to create output that is sent back to the server. The server takes the output and returns it to the browser.
As a static text file, HTML pages start with the <HTML> tag and end with the </HTML> tag. When the server reads a static page from the disk, it outputs not only the text file to the browser, but also adds a header at the beginning. The header contains information for the browser that describes the server state and the content of the information following the header. The header contains a status line to indicate that the server transaction completed successfully, and it also contains a line to indicate the format of the information following the header.
When your CGI script executes, it must generate its own header information to be sent back to the server. The capability of sending back the header line allows the script to notify the server whether or not it has run successfully. For now, assume that the CGI script is returning as an HTML page to the browser.
A successful execution written in the header might look like this:
Status: 200 Content-type: text/html <HTML> <BODY> Hello World </BODY> </HTML>
Notice the extra line between the header and the first <HTML> tag. This is very important syntax for the Web browser; without this line, your CGI script won't run properly. The server returns the output of your CGI script to the client's browser. When viewed as source from the browser, the header won't be displayed. 200 is a standard success code. Table 13.2. contains other code.
Notice that codes in the 200s are used when the action was successfully received, understood, and accepted. Codes in the 300s are used when further action must be taken in order to complete the request. Codes in the 400s are used when the request contains bad syntax or cannot be fulfilled. Codes in the 500s are used when the server failed to fulfill an apparently valid request.
When your CGI script runs into an error, either processing information or accessing resources like SQL Server, it should return both an error status code and some HTML text describing the problem. For instance:
Status: 500 Content-type: text/html <HTML> <BODY> Server Error, please try again later </BODY> </HTML>
Before you get started making your own scripts and viewing them, you need to have a development environment. I prefer to have a single machine running NT 3.51 that has both the browser, the Web server, and my compiler on it. Debugging takes place on the Web server because the CGI scripts are running on the Web server. With Microsoft Developer Studio, you can debug the CGI scripts you write, so it makes sense to have both the compiler and the Web server on the same machine. I prefer not to swivel between two machines because it will be the browser that activates your scripts on the Web server. Finally, you need to have a Web server in which there is low activity. A poorly written script can crash the Web server, causing downtime for other users.
Writing CGI scripts for Web servers is like allowing everyone to run a program on your machine. The first step to good security is to make sure that the users cannot read your script. This isn't a problem if you use compiled program written in a language such as C++. This is a concern when you're writing in batch or another type of runtime script. Make sure that the directory that contains your script has EXECUTE permissions, but not READ. A good example is the default IIS (Microsoft's Internet Information Server) script directory. This will allow users to execute the CGI scripts but not read them.
The default installation of IIS can execute batch files as CGI scripts. Batch files, considered the scripting language of DOS, are not as powerful as Perl scripts in UNIX. Because batch files lack string handling functions, they are limited in use as CGI scripts. Their only advantage is that they are a runtime language and make good sample programs.
Let's create a "Hello World!" CGI script. Create a batch file named List13_1.bat in the scripts directory of your Web server, as shown in Listing 13.1.
Listing 13.1. The Hello World Example
@echo off REM Header echo Status: 200 echo Content-type: text/html echo. REM Body echo "<HTML><BODY>Hello World!</BODY></HMTL>"
To run the script, type a URL address into your browser as MYMACHINE\Scripts\Lst13_1.bat, where MYMACHINE is the name of your computer.
The first thing to notice is that "Hello World!" is in quotes. This is because the echo statement thinks that > is the symbol to pipe the output to a device. In double quotes, the > is outputted instead of the piped. This problem is the reason that batch files make poor CGI scripts.
The server puts the CGI parameters into the environment variable QUERY_STRING. QUERY_STRING contains the CGI string in its CGI-coded form. It is the responsibility of the CGI script to get the information it needs from QUERY_STRING. Unfortunately, batch files do not have functions for string manipulation. This means that you will be able to view the CGI string but not separate the 'name = value' pairs or translate the CGI hexadecimal values. This problem is another good reason not to use batch files for CGI scripts.
Listing 13.2 allows the user to view the CGI parameters passed to the batch file.
Listing 13.2. A batch file that returns the Query string.
@echo off REM Header echo Status: 200 echo Content-type: text/html echo. REM Body echo "<HTML><BODY>Query String: %QUERY_STRING%</BODY></HMTL>"
To run the script, type a URL address into your browser as MYMACHINE\Scripts\Lst13_2.bat, where MYMACHINE is the name of your computer. Notice that the QUERY_STRING doesn't contain a value; the browser displays "Query String:" as the text on the HTML page. Now change the URL to MYMACHINE\Scripts\Lst13_2.bat?Name=John. The browser now displays "Query String: Name=John" as the text on the HTML page.
Besides the parameters passed to the server as part of the URL, the server also puts other information about the browser and the server state in environment variables (see Table 13.3).
Listing 13.3 is an example of a batch file that displays all the major environment variables.
Listing 13.3. Views all the return values from a batch file CGI script.
@echo off REM Header echo Status: 200 echo Content-type: text/html echo. REM Body echo "<HTML><BODY>" echo QUERY_STRING: %QUERY_STRING% "<BR>" echo ALL_HTTP: %ALL_HTTP% "<BR>" echo HTTP_USER_AGENT: %HTTP_USER_AGENT% "<BR>" echo HTTP_REFERER: %HTTP_REFERER% "<BR>" echo HTTP_CONTENT_TYPE: %HTTP_CONTENT_TYPE% "<BR>" echo HTTP_CONTENT_LENGTH: %HTTP_CONTENT_LENGTH% "<BR>" echo HTTP_EXTENSION: %HTTP_EXTENSION% "<BR>" echo AUTH_TYPE: %AUTH_TYPE% "<BR>" echo CONTENT_LENGTH: %CONTENT_LENGTH% "<BR>" echo CONTENT_TYPE: %CONTENT_TYPE% "<BR>" echo GATEWAY_INTERFACE: %GATEWAY_INTERFACE% "<BR>" echo HTTP_ACCEPT: %HTTP_ACCEPT% "<BR>" echo PATH_INFO: %PATH_INFO% "<BR>" echo PATH_TRANSLATED: %PATH_TRANSLATED% "<BR>" echo REMOTE_ADDR: %REMOTE_ADDR% "<BR>" echo REMOTE_HOST: %REMOTE_HOST% "<BR>" echo REMOTE_USER: %REMOTE_USER% "<BR>" echo REQUEST_METHOD: %REQUEST_METHOD% "<BR>" echo SCRIPT_NAME: %SCRIPT_NAME% "<BR>" echo SERVER_NAME: %SERVER_NAME% "<BR>" echo SERVER_PORT: %SERVER_PORT% "<BR>" echo SERVER_PROTOCOL: %SERVER_PROTOCOL% "<BR>" echo SERVER_SOFTWARE: %SERVER_SOFTWARE% "<BR>" echo "</BODY></HMTL>"
Save the preceding example as Lst13_3.bat in your scripts directory and call it from your wwwroot directory by creating a Lst13_3.htm that looks like this:
<HTML> <BODY> <FORM ACTION="http://MYMACHINE/scripts/lst13_3.bat"> Name: <INPUT TYPE=TEXT NAME="Name"> <INPUT TYPE=SUBMIT> </FORM> </BODY> </HTML>
There are two TYPE methods that FORM can use to transmit the parameters for the CGI script: GET and POST. You can choose which method to use in the <FORM> tag by entering METHOD=POST or METHOD=GET. By not entering any action, the form defaults to GET. Both methods send information by the way of CGI to the server, but in different ways.
The main difference between POST and GET is the way in which you receive the CGI parameters. With GET, you get the parameters through QUERY_STRING. With POST, the parameters are piped into the batch file through standard input (stdin). Another difference is that GET can support only 255 characters in the CGI string. POST has an unlimited number.
Change Lst13_3.htm's FORM to read
<FORM ACTION="http://MYMACHINE/scripts/Lst13_7.bat" METHOD=POST>
Now, try resubmitting the CGI script to the Lst13_3.bat. Notice that the QUERY_STRING isn't filled in. Also notice that CONTENT_LENGTH has a value of 10 with the GET method and a value of 0 with the POST method. Look at the address space of the URL at the top of your browser: with a GET request the CGI script parameters appeared, but with a POST request the CGI script parameters don't appear. This is important for passing confidential information from one page to another. POST also gives a cleaner look to your Web page. Finally, make note that the REQUEST_METHOD is POST and not GET. The differences between POST and GET are listed in Table 13.4.
Batch files have no way of supporting standard input (stdin). By using CGI scripts created in C++, you can handle standard in and separate the 'name = value' pairs passed in by the form.
To create a C++ CGI script, I use VC++ 4.1 and MFC. All C++ CGI scripts must be console applications, in Microsoft Windows. It's important to remember that there is a possibility of more than one user using your application at a time. For every user executing the script, a new instance of the application will be open.
To create a C++ CGI script start by
Listing 13.4. A C++ Hello World Example
// lst13_4.cpp #include <stdio.h> void main( int argc, char *argv[ ], char *envp[ ] ) { // Header printf("Status: 200\r\n"); printf("Content-type: text/html\r\n"); printf("\r\n"); // Body printf("<HTML><BODY>Hello World!</BODY></HTML>\n"); }
In step 7, you chose a shared mfc40.dll instead of a static library. The reason for this is that it cuts down on operating overhead. To reduce operating overhead, having a smaller executable is better. Sharing DLLs makes for smaller executables because the MFC code is in a DLL, not bound to your executable. If more than one person is using your application, then more than one of your applications is loaded into memory, but only one shared mfc40.dll is loaded. Also, the smaller your CGI script, the quicker it will load, allowing the page to be sent to the user faster. The thing to remember about using shared DLLs is that you will need to copy mfc.dll to the server along with your CGI script. Remember that the preceding example assumes that your compiler and your server are on the same machine. This means that mfc.dll will be in your system path and you will not need to copy it.
In step 7, you chose to use MFC, yet the code used as an example didn't reference MFC. The preceding example is intended to be used as a generic example of how to create a CGI script. Other examples in this chapter will reference MFC.
Make sure to compile the debug configuration. CGI Scripts will run in both debug and release. For debugging purposes, you need to have a debug build. Final products should be in a release build because release executables are smaller and take less time for the Web server to load.
More advanced users of Microsoft Developer Studio may want to create their projects within the scripts directory of the IIS. The advantage of this is that you don't have to copy the executable (Step 12). When naming the project, select the scripts directory as the location instead of the default projects directory. When you compile your executable, the CGI script will be built into the scripts directory. The browser URL will also be different:
MYMACHINE/scripts/Lst13_4/debug/Lst13_4.exe
where MYMACHINE is the name of your computer.
The CGI scripts are no different then a regular console application. They can be run from a DOS command line and will display exactly the same information that it sends to the server. In fact, the way to send information to the server is to write to standard out (stdout), just like a console application. Running the console application is a way of debugging your executable.
Notice the header section of example nine source code. Each header line has both a new line character \n and a carriage return \r; these are required. Also note that the required space is represented by a \r\n.
Create another console application as you did in Listing 13.4, call it Lst13_5, and use the source from Listing 13.5. Compile it and copy it to your server's scripts directory.
Listing 13.5. An example of viewing the query string.
// lst13_5.cpp #include <stdio.h> #include <afx.h> void main( int argc, char *argv[ ], char *envp[ ] ) { // Header printf(_T("Status: 200\r\n")); printf(_T("Content-type: text/html\r\n")); printf(_T("\r\n")); // Body printf(_T("<HTML><BODY>")); DWORD dwBufferSize=50; LPTSTR szQuery = new TCHAR[dwBufferSize]; GetEnvironmentVariable(_T("QUERY_STRING"),szQuery, dwBufferSize); printf(_T("QueryString: %s"),szQuery); printf(_T("</BODY></HTML>\n")); delete szQuery; }
Copy Listing 13.3's HTML form from Lst13_3.htm to Lst13_5.htm and change the Form attributes to read
<FORM ACTION="http://MYMACHINE/scripts/Lst13_5.exe" METHOD=GET>
Load Lst13_5.htm in your browser, type a name, and submit the data to your C++ CGI script. The query string should represent the name you typed.
With the power of C++, you can take the query string passed by the server and resolve the 'name = value' pairs of CGI into C variables that you can use. In addition, you can retrieve the information from standard input (stdin) and process post methods.
First, you must make a form that sends interesting data to your CGI script. Save the code in Listing 13.6 as lst13_6.htm in the wwwroot directory of your server.
Listing 13.6. Reading CGI parameters.
<HTML> <BODY> <FORM ACTION="http://MYMACHINE/scripts/Lst13_10.exe" METHOD=GET> Name: <INPUT TYPE=TEXT NAME="Name"><BR> Age: <INPUT TYPE=TEXT NAME="Age"><BR> <INPUT TYPE=SUBMIT> </FORM> </BODY> </HTML>
Create, compile, and copy to the server a CGI script called Lst13_6.exe that contains the following source code:
// Lst13_6.cpp #include <stdio.h> #include <afx.h> // Global Cache for Post // You can only read from // Standard In Once TCHAR *szCache=NULL; TCHAR ConvertHex(TCHAR cHigh, TCHAR cLow) { static const TCHAR szHex[] = _T("0123456789ABCDEF"); LPCTSTR pszLow; LPCTSTR pszHigh; TCHAR cValue; // Find the Values in the Hex String pszHigh = _tcschr(szHex, (TCHAR) _totupper(cHigh)); pszLow = _tcschr(szHex, (TCHAR) _totupper(cLow)); // If both Values Exist Then Calculate the Value // Based off of the string if (pszHigh && pszLow) { cValue = (TCHAR) (((pszHigh - szHex) << 4) + (pszLow - szHex)); return (cValue); } return('?'); } // Returns the String LPVOID TranslateCGI(LPTSTR pszString) { LPTSTR pszIndex = pszString; LPTSTR pszReturn = pszString; // unescape special characters while (*pszIndex) { // Translate '+' to Spaces if (*pszIndex == _T('+')) *pszReturn++ = _T(' '); // Translate Hex Strings to characters else if (*pszIndex == _T('%')) { *pszReturn++=ConvertHex(pszIndex[1], pszIndex[2]); pszIndex+=2; } // or just copy the character else *pszReturn++ = *pszIndex; pszIndex++; } // Terminate the End *pszReturn = '\0'; return (LPVOID) pszString; } DWORD GetValue(LPTSTR szCGI, LPTSTR szName, LPTSTR szValue, DWORD dwValueSize) { LPTSTR szIndex; LPTSTR szEnd; DWORD dwReturnSize=0; // Find The Name in the Query String szIndex=_tcsstr(szCGI,szName); // Error: The Name part of the Name value pair doesn't exist if (!szIndex) return (0); // Increase the pointer passed the Name and Get to the Value szIndex+=_tcslen(szName)+1; // Find the End of the Value by looking for the '&' szEnd=_tcschr(szCGI,_T('&')); // if we find a '&' set it as the end if (szEnd) (*szEnd)='\0'; // Remove the CGI Syntax TranslateCGI(szIndex); // Calculate the Value Size dwReturnSize=_tcslen(szIndex); // Chop the Value if bigger than the Allocation of Value if (dwReturnSize>dwValueSize) szIndex[dwValueSize]=_T('\0'); // Assign the Value if there is allocated space // if no space has been allocated then the caller // is just looking for the string size if (szValue) _tcscpy(szValue,szIndex); // If we are going to return the size of // Allocated space we might as well // include the Null return (dwReturnSize+1); } // Returns The Length of szValue on successful execution else returns 0 DWORD GetMethod(LPTSTR szName, LPTSTR szValue, DWORD dwValueSize) { DWORD dwBufferSize=0; DWORD dwReturnSize=0; LPTSTR szQuery=NULL; // Call GetEnvironmentVariable To get the buffer size dwBufferSize=GetEnvironmentVariable(_T("QUERY_STRING"),szQuery,dwBufferSize); // Error: QUERY_STRING doesn't exist if (!dwBufferSize) return(0); // Allocate the Space needed szQuery = new TCHAR[dwBufferSize]; // Call Again dwBufferSize=GetEnvironmentVariable(_T("QUERY_STRING"),szQuery,dwBufferSize); // Get the Value From the Query String dwReturnSize=GetValue(szQuery,szName,szValue,dwValueSize); delete szQuery; return (dwReturnSize); }; // Returns The Content Length on successful execution else returns 0 DWORD GetContentLength() { DWORD dwBufferSize=0; LPTSTR szContentLength=NULL; DWORD dwContentLength; // Call GetEnvironmentVariable to get the buffer size dwBufferSize=GetEnvironmentVariable(_T("CONTENT_LENGTH"), szContentLength,dwBufferSize); // Error: CONTENT_LENGTH doesn't exist if (!dwBufferSize) return(0); // Allocate the Need Space szContentLength = new TCHAR[dwBufferSize]; // Call Again dwBufferSize=GetEnvironmentVariable(_T("CONTENT_LENGTH"), szContentLength,dwBufferSize); // Change the String to a usable form dwContentLength=(DWORD)_ttoi(szContentLength); delete szContentLength; return(dwContentLength); }; // Returns The Length of szValue on successful execution else returns 0 DWORD PostMethod(LPTSTR szName, LPTSTR szValue, DWORD dwValueSize) { DWORD dwBufferSize; LPTSTR szContentType=NULL; DWORD dwContentTypeSize=0; LPTSTR szPost; DWORD dwReturnSize=0; UINT nCount; if (szCache) { dwBufferSize=_tcslen(szCache); // Allocate Some Memory for szPost Plus the NULL szPost=new TCHAR[dwBufferSize+1]; _tcscpy(szPost,szCache); } else { // Look at the CONTENT_TYPE to see if it is a POST // Call GetEnvironmentVariable to get the buffer size dwContentTypeSize=GetEnvironmentVariable(_T("CONTENT_TYPE"), szContentType,dwContentTypeSize); // Error: CONTENT_TYPE doesn't exist if (!dwContentTypeSize) return(0); // Allocate the Need Space szContentType = new TCHAR[dwContentTypeSize]; // Call Again GetEnvironmentVariable(_T("CONTENT_TYPE"),szContentType,dwContentTypeSize); if (!_tcscmp(szContentType,_T("application/x-www-form-urlencoded"))) { // Figure out the Size of the String dwBufferSize=GetContentLength(); if (!dwBufferSize) return(0); // Declare the Memory for the String plus the NULL szPost = new TCHAR[dwBufferSize+1]; nCount=0; // Read the Standard In while (!feof(stdin) && (nCount<dwBufferSize)) { szPost[nCount++]=(TCHAR)_fgetchar(); } szPost[nCount]=_T('\0'); // Cache the CGI String szCache=new TCHAR[dwBufferSize+1]; _tcscpy(szCache,szPost); } else { // Not a POST so return unsuccessfull return(0); } } // We now have the CGI String, Lets Get the Value // Get the Value From the Post String dwReturnSize=GetValue(szPost,szName,szValue,dwValueSize); delete szPost; // If we are going to return the size of // Allocated space we might as well // include the Null return (dwReturnSize); return(0); }; // Returns The Length of szValue on successful execution else returns 0 DWORD GetParameter (LPTSTR szName, LPTSTR szValue, DWORD dwValueSize) { DWORD dwBufferSize=0; LPTSTR szRequestMethod=NULL; // Look at the environment variable REQUEST_METHOD // Call GetEnvironmentVariable to get the buffer size dwBufferSize=GetEnvironmentVariable(_T("REQUEST_METHOD"), szRequestMethod,dwBufferSize); // Error: REQUEST_METHOD doesn't exist if (!dwBufferSize) return(0); // Allocate the Need Space szRequestMethod = new TCHAR[dwBufferSize]; // Call Again dwBufferSize=GetEnvironmentVariable(_T("REQUEST_METHOD"), szRequestMethod,dwBufferSize); // It's has to be POST or GET if (!_tcscmp(szRequestMethod,_T("GET"))) { delete szRequestMethod; return(GetMethod(szName,szValue,dwValueSize)); } if (!_tcscmp(szRequestMethod,_T("POST"))) { delete szRequestMethod; return(PostMethod(szName,szValue,dwValueSize)); } delete szRequestMethod; return(0); }; int main( int argc, char *argv[ ], char *envp[ ] ) { DWORD dwValueSize=0; LPTSTR szValue=NULL; // Header _tprintf(_T("Status: 200\r\n")); _tprintf(_T("Content-type: text/html\r\n")); _tprintf(_T("\r\n")); // Body _tprintf(_T("<HTML><BODY>\n")); // Find out how big the name parameter is going to be dwValueSize=GetParameter(_T("Name"),szValue,dwValueSize); // Allocate enough space for The Value of Name szValue=new TCHAR[dwValueSize]; // Get The Value Again this time with a big enough buffer dwValueSize=GetParameter(_T("Name"),szValue,dwValueSize); // Display the Name if (dwValueSize) _tprintf(_T("Name : %s\n"),szValue); delete szValue; _tprintf(_T("<BR>\n")); // Do it all Again for Age dwValueSize=GetParameter(_T("Age"),szValue,dwValueSize); szValue=new TCHAR[dwValueSize]; dwValueSize=GetParameter(_T("Age"),szValue,dwValueSize); if (dwValueSize) _tprintf(_T("Age : %s\n"),szValue); _tprintf(_T("</BODY></HTML>\n")); delete szValue; // Clean the Cache if (szCache) delete szCache; return(0); }
The Main function calls GetParameter() twice, once with name and once with age. The return will be the value of name and age as passed in by the form. If no value is present or there is an error, the GetParameter() will return zero; otherwise, it will return the character length of the value.
With GetParameter(), no matter what method is used in the form, the value of the named variable will be returned. GetParameter() looks at the environmental variable REQUEST_METHOD to figure out if the method is a POST or a GET. If it is a POST method, PostMethod() is called. In PostMethod(), the environment variable CONTENT_TYPE is checked to make sure that the post is coming from a form. CONTENT_LENGTH is also checked so that the right size string can be allocated. The CGI string is then read from standard in as a file would be read. Notice that in PostMethod(), you cache the CGI string returning from standard input; standard input can be read only once. If GetParameter() detects that the GET method is called, then GetMethod() is executed. GetMethod() loads the CGI script from QUERY_STRING. Try experimenting with the FORM method, changing the method in lst13_6.htm to POST. Type lst13_6.htm into your browser address space. You will need to refresh to get the changes. Now, try resubmitting the data.
Both GetMethod() and PostMethod() call GetValue() after the CGI string is acquired. GetValue() separates the value that is associated with the name from the CGI string. After the value is separated, it's passed to TranslateCGI(). TranslateCGI() changes + to spaces and translates hexadecimal characters.
Notice that TCHAR and _t windows functions were used wherever possible. This allows the code to be compiled either in MBCS or UNICODE.
It's important to consider the length of the strings you allocate to hold the data coming in. Remember that real users mean real data. Consider an input tag that sends in first names. On the Web, users from all over the world can access your page, so their first names might be longer than you expect. If the data overrides the memory you have allocated, the CGI script could crash, causing the server to crash. To solve this problem, you can limit the data coming in by using the MAXLENGTH attribute on input tags. You can handle any length value like GetParameter(). Also, make sure that users cannot send in parameters that make your CGI script crash. Because users can type anything in the URL address of their browser, make sure to test all possibilities. For instance, if the standard URL address and CGI string look like this:
http://MYMACHINE/scripts/mycgi.exe?Name=John
and the a user types this instead
http://MYMACHINE/scripts/mycgi.exe?&Name&Name=&&&&&&
can your CGI script handle it without crashing?
One of the disadvantages of using C++ CGI scripts is that you have to debug them. The optimal way to test CGI scripts is to put them into the scripts directory and have the server call them. This way, they are called just as they would be in practice. Microsoft Developers Studio has an excellent debugging environment, but in order to use the debugger, Microsoft Developers Studio has to call the executable you're testing. Here lies the dilemma: either to have the server call the CGI scripts without a debugger or have Microsoft Developers Studio call the CGI script without the benefit of the server.
If the server calls the CGI scripts and there is an ASSERT or an access violation, the process running the CGI script hangs. This usually causes the server not to return a header, leaving the browser waiting until it times out. The reason an ASSERT or an access violation hangs is that it tries to initialize a message box without a valid window handle. The only solution is to kill the process. With the server calling the CGI script, other errors such as returning the wrong output are equally as hard to debug. For instance, if no output is returned by the CGI script, the browser displays the screen shown in Figure 13.1.
This type of screen leaves the programmer no information for debugging. Like an ASSERT or an access violation, inserting a DebugBreak() into the CGI script also hangs the process.
The solution is to redirect the output for ASSERT, warnings, or errors to an output file. The StartDebugging() and StopDebugging() procedures in Listing 13.7 create a file called error.log, and all warnings, errors, and ASSERTs will be written to it. StartDebugging() opens the file and redirects the output. StopDebugging() closes the file handle. Add StartDebugging to the beginning of Main() and StopDebugging() to the end of Main().
Listing 13.7. Redirecting debugging information to a file.
void StartDebugging() { #ifdef _DEBUG hFile= CreateFile("error.log",GENERIC_WRITE, FILE_SHARE_WRITE,NULL,OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL,NULL); if (hFile) { _CrtSetReportMode(_CRT_WARN, _CRTDBG_MODE_FILE); _CrtSetReportFile(_CRT_WARN, hFile); _CrtSetReportMode(_CRT_ERROR, _CRTDBG_MODE_FILE); _CrtSetReportFile(_CRT_ERROR, hFile); _CrtSetReportMode(_CRT_ASSERT, _CRTDBG_MODE_FILE); _CrtSetReportFile(_CRT_ASSERT, hFile); } _RPT0(_CRT_WARN, "Start Debug Reporting\r\n" ); #endif }; void StopDebugging() { #ifdef _DEBUG _RPT0(_CRT_WARN, "Stop Debug Reporting\r\n" ); if (hFile) CloseHandle(hFile); #endif }; int main( int argc, char *argv[ ], char *envp[ ] ) { StartDebugging(); // The Code StopDebugging(); return(0); }
Notice that _RPT0() can be used to write to error.log. This function could be used to mark entrances and exits of procedures, output variables, and other information for debugging. The error file will not be created with release builds and shouldn't be used with multiple processes running.
Unless you precede the error filename with the full path, the file will be created in the scripts directory. This could be a security issue if your users can read the scripts directory. Either redirect the output out of the Web space or make sure that the directory security is set to execute only.
The other option for debugging your C++ CGI script is to call the CGI script from Microsoft Developer Studio. The advantage of this, compared to having the Web server call the script, is that you can set break points and view variables with Microsoft Developer Studio. Problems start to arise, however, when the input to your CGI script is considered. Because the CGI script retrieves its output from environment variables set by the server, environment variables need to be set to debug. Microsoft Developer Studio doesn't have an easy way to set environment variables, so either the programmer needs to set the variables in the System Properties or the variables need to be set in the program. Setting the variables in the System Properties requires the programmer to open the control panel, change the variables, and restart Microsoft Developer Studio. It's easier to set the variables in the program, recompile the program, and run it from the debugger. Here is an example of setting the variables for Listing 13.8.
Listing 13.8. Inserts for the Setting of the Variables
int main( int argc, char *argv[ ], char *envp[ ] ) { SetEnvironmentVariable("QUERY_STRING","Name=John+Doe&Age=25"); SetEnvironmentVariable("REQUEST_METHOD","GET");
This is also a great deal of trouble because every time the strings change, you need to recompile.
The POST method poses another problem. Microsoft Developer Studio does not allow you to pipe standard input into a program that you are running on the debugger. Because the Web server sends its information through standard input into a program, there is really no way to test the POST method with Microsoft Developer Studio.
There is no surefire, easy method for testing C++ CGI scripts with either the Web server or Microsoft Developer Studio. These issues get resolved with ISAPI Server Extensions because Microsoft Developer Studio can handle Server Extensions DLLs better.
The Common Gateway Interface (CGI) is a standardized parameter passing syntax. CGI scripts are programs that are executed by the Web server, that read a CGI parameter string, and that output to the client. Because the data returned to the browser differs for every execution, CGI scripts create dynamic Web pages. A CGI script can be written in any language the Web server can execute. The server passes parameters to the CGI script with environment variables and standard input, and the CGI script passes the output to the server with standard output. A good CGI script can read the CGI sting passed in by the GET and POST method. Although debugging CGI script is not straightforward, you can use them to create dynamic and interactive Web pages quickly.