This PHP script opens and reads the contents of a Microsoft Word document (.doc only) using the COM object. It runs as a script from the CLI as well as in the browser.

1. php.ini

  1. Open your php.ini file and find the following section:
;;;;;;;;;;;;;;;;;;;;;;
; Dynamic Extensions ;
;;;;;;;;;;;;;;;;;;;;;;
  1. Add the following entry on a blank line in the above section:
    extension=php_com_dotnet.dll
  2. Search for allow Distributed-COM calls and uncomment the following line (delete the semi-colon):
    ; com.allow_dcom = true
  3. Your .ini now reads as follows:
; allow Distributed-COM calls
; http://php.net/com.allow-dcom
com.allow_dcom = true
  1. Save and close your php.ini

2. Windows Component Services

The following solution was adapted from https://forums.phpfreaks.com/topic/191034-word-com-object-throwing-exception/#comment-1007347

  1. Windows Key on the keyboard. + R (or right-click the Windows Start button icon and select Run), type dcomcnfg and press enter to open Component Services
  2. Expand Component Services » Computers » My Computer
  3. Select DCOM Config
  4. Search for and select Microsoft Word 97 – 2003 Document (it will be something like this translated to your language, so take a while to search for it)
  5. Right-click on it and open Properties
  6. Choose the Identity tab
  7. Normally this is set to The launching user; change it to The interactive user (or use the This user option to select an admin user of your choice).
  8. Apply these new settings and test your COM application. It should work fine now.

Remember to restart Apache after changing your configuration!

3. PHP script

The following PHP was adapted from this example (minus one or two syntax errors) https://www.tek-tips.com/viewthread.cfm?qid=1692863#post-6896134 :

<?php
echo "Starting...\n";

set_time_limit(60); //to allow time for Word to load.
$word = new COM("word.application") or die("Could not initialise MS Word object.");

echo "COM instantiated\n";
$word->Application->Visible = false;
echo "Set visibility to false\n";

$doc = 'test.doc';
$document = realpath($doc);

if (is_readable($document)) :
    echo "Document exists and is readable.\n";
else :
    if (!is_file($document)) :
        echo "Document does not exist\n";
        die();
    else :
        echo "Document is not readable\n";
        die();
    endif;
endif;

$word->Documents->Open($document);

// Extract content. 
$content = $word->ActiveDocument->Content;
echo "test\n----------\n";
print_r($content);
echo "test\n----------\n";
echo "Extracting string value of content\n";
$content = (string) $content;
echo "test\n----------\n";
echo $content;
echo "test\n----------\n";

$word->ActiveDocument->Close(false);
echo "Closed Document\n";
$word->Quit();
echo "Quit Word \n";
$word = null;
unset($word);

4. CLI

Run the program from the command prompt or PowerShell:

php index.php


References:

  1. Word COM Object throwing exception (2010). Available at: https://forums.phpfreaks.com/topic/191034-word-com-object-throwing-exception/#comment-1007347 (Accessed: 29 March 2023).
  2. (Programmer), B. and (TechnicalUser), jpadie (2012) Extracting text from Word Documents via PHP and COMTek. Available at: https://www.tek-tips.com/viewthread.cfm?qid=1692863#post-6896134 (Accessed: 17 June 2024).
  3. PHP: Installing/Configuring – Manual (2023). Available at: https://www.php.net/manual/en/com.setup.php (Accessed: 29 March 2023).

By MisterFoxOnline

Mister Fox AKA @MisterFoxOnline is an ICT, IT and CAT Teacher who has just finished training as a Young Engineers instructor. He has a passion for technology and loves to find solutions to problems using the skills he has learned in the course of his IT career.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.