November 2007


PHP28 Nov 2007 05:49 pm

Update! (3/31/2008): Now works with Zend Framework 1.5!
Update! (3/19/2008): I have submitted an official proposal for this to be included into the Zend Framework.
Update! (1/12/09): Dominik Deobald was kind enough to supply a UTF character encoding patch and bug fix. I have attached his fix as a .patch file at the end of the post.

I have recently been working with Zend_PDF to create PDF documents. One of the basic requirements for my project is the ability to center text within a screen. Needless to say, I was quite disappointed when I came to find out that Zend_Pdf doesn’t currently have any type of text layout support except for the exact position to place it.

I created a small extension to Zend_Pdf, and building off of an idea from FPDF, I created Zend_Pdf_Cell. Zend_Pdf_Cell is not currently supported by Zend, and is not officially in the Zend Framework, although I have posted my code for them to view it and give feedback (and hopefully have it incorporated into Zend Pdf). My first try I misunderstood parts of Zend_Pdf, but I have gone back and fixed it.

Features

These features have been most used by myself and mostly work.
* The ability to create a cell and place text in it.
* Specify the width and height of a cell
* Position a cell (one or more of these combined)
** To the left
** To the right
** At the bottom
** At the top
** Centered horizontally
** Centered vertically
* Align text within a cell
** Left
** Right
** Centered
** (To be done later) Justify
* Format different parts of the text in different fonts

Experimental

I just recently put these abilities in, so they may not fully work or even work properly.
* Create a border around the cell
* Word wrap text around the cell

Installation

To install this, just place the Cell.php file in Zend/Pdf/, then in your php file, just add:

include_once('Zend/Pdf/Cell.php');

Examples

The following is an example of how to use the Zend Pdf Cell:

pages[] =new Zend_Pdf_Page(Zend_Pdf_Page::SIZE_A4);
 $font=Zend_Pdf_Font::fontWithName(Zend_Pdf_Font::FONT_TIMES_ITALIC);
 $pdf->pages[0]->setFont($font,12);
 //Creates a cell in the specified page
 $cell=new Zend_Pdf_Cell($pdf->pages[0]);

 //adds a cell in the upper left with "Hello World"
 $cell->addText("Hello World");
 $cell->write();

 //creates a cell in the center of the page
 //To do top and right, then you would
 //or together POSITION_RIGHT and
 //POSITION_TOP.
 $cell=new Zend_Pdf_Cell($pdf->pages[0],
                                    Zend_Pdf_Cell::POSITION_CENTER_X |
                                    Zend_Pdf_Cell::POSITION_CENTER_Y);
 //add a 1 pixel border
 $cell->setBorder(1);
 //align to the right
 $cell->addText("The quick brown fox jumped over the lazy dog",
                        Zend_Pdf_Cell::ALIGN_RIGHT);
 $cell->write();
?>

If you have any questions or wish to report problems, please use the comment box below. I would also like to hear if you have successfully implemented this!

Files

Cell.php – Zend Framework 1.0.*
Cell.php – Zend Framework 1.5.*
(Updated 1/12/08)
UTF character patch by Dominik

Development and School25 Nov 2007 03:04 am

Hanzi Recognizer is a project that has been born out of CS490t at Purdue University by myself and Nathan Hobbs. The information below is current as of 11/24/2007.

Features

When you draw a Chinese character in the drawing panel, and look it up, it gives 5 of the closest matches. For each character, it also informs the user of the definition, pronunciation, main radical and main radical definition.

hanzirecognizer.png

High Level Implementation Details

The stroke recognition and scoring algorithm isn’t the greatest, but it is based upon JavaDict, a 10 year old Java 1.2 program. A stroke is defined as pen down until pen up. What it does, is it turns a single stroke into a number. The number represents the direction the stroke was made. When you make a multiple direction stroke (Like drawing a capital L), it is registered as two directional strokes.

Directional strokes:
         7   8   9
          \  |  /
        4 ---5--- 6
          /  |  \
         1   2   3

So an L would be considered a 26 stroke. Currently then, it scores the given stroke against every stroke in the database, with the lowest score being the best match.

We then have a Unistrok file that contains how to draw every character. This file looks similar to:

6c34 | 61 2 1 3

Where it is [unicode | strokes].

To Do

Currently, we need a better scoring algorithm. If there are too many or too few strokes drawn, then we just arbitrarily give it a bad score. This can be improved through implementing an edit distance algorithm.

Allowing for characters to be searched by pronunciation or radical.

It takes a long time to load. This is a side effect of the fact that our characters are stored unsorted, therefore it takes O(n) time to search for a character. In itself that’s not slow, but doing that 50,000 times is slow. We need to sort the characters as we store them and search with a binary search (or hash and search in O(1)).

Nate just implemented some basic speech functionality. We need to get some speakers to speak the pronunciations and implement this.

Currently we are creating our own XML file format to store the characters so we don’t have to get information from 3 different databases. This file can be rebuilt as the others are updated, and itself would cut down on the load times by almost a full order of magnitude.

We need to add more support for more variants of radicals and simplified / traditional characters. Currently, there are many places where we (seemingly) arbitrarily choose to display either a traditional or simplified variant.

References

Unihan Database

CEDict

Development15 Nov 2007 12:21 am

Make sure in your prepared statement, you don’t put quotes around your question mark, it’ll save you some debugging!

Bad:

SELECT * FROM User WHERE name='?'

Good:

SELECT * FROM User WHERE name=?