Introduction
Discussion for this module occurs on the
php-xpath-users@lists.sourceforge.net maillist. The latest version of
the source is available at sourceforge at
http://www.sourceforge.net/projects/phpxpath. The module is maintained
by:
Known Bugs Outstanding:
-
Bug:
Use of the "string()" function will probably not work, leave the class to do
this itself and you should be ok.
-
RFE:
Support text, comment, processing instruction nodes.
-
RFE:
Character entities.
-
RFE: When reading in an xml file, don't read the whole file
into memory, but read a chunk at a time and send it to the XML parser.
Latest version
Php.XPath Version 3.5 - 13th August 2004
-
Upgrades to work with PHP5:
-
Remove assignment to $this: My thanks go to the many
people who have reported this and provided a fix long before I got round to
fixing it.
-
Don't use translateAmpersand: Andrej Arn kindly reported
that the translateAmpersand function is no longer required in PHP5, so has
provide a fix such that it is cicumvented if the phpversion() is V5
-
Improved XPath Expression parsing:
-
Correct XPath expression parsing: More precisely
follow the specification correcting a number of boundary expression cases.
-
Correct parsing of node names: Permit namespaces in
node names and the "." character.
-
Handle brackets correctly: Now properly supports
nested bracket expressions like "(() and ())" or "() and ()".
-
Cache static parsing results to speed performance: Because
of the evaluation structure required by the XPath specification, we were
parsing the same sub-expressions repeatedly. Caching part of this
parsing provides a 20% performance improvement to scan times.
-
Improved XPath Expression evaluation:
-
Support equality XPath operators on node sets: Previously
the equality operators like <, >, >=, <= would not work if one of
the arguments was a node set.
-
Treat operators involving node sets correctly: If
both arguments are node sets, then we should compare entries
using the string operator, not the integer operator. If one operand
is a node set we should match the other against every item in the node set
rather than converting the node set to the type of the other operand.
-
string(true()) should return "true" not "TRUE": Possible
breaking change!! Closer analysis of the XPath specification
says that we were doing it wrong initially :o( Same goes
for false/FALSE.
-
Improved debugging: Turning on debugging can now be done
at run time through an array of function names rather than at compile
time.
Previous versions
Php.XPath Version 3.4 - 29th May 2003
-
Case insensitive searching support:
Added x-lower() and an x-upper() XPath functions that permit case insensitive
search.
-
!= works for strings: previously only worked for booleans.
Thanks to Ricardo Garcia rickg22@users.sourceforge.net
for the bug report and fix.
-
Auto detect dirty indexes:
Dom modifications with autoIndex=FALSE will flag the index as dirty, and
reindexnodeindex will be called "late" just before it's needed.
-
handleFunction_string() decodes htmlentities:
XPath searching is therefore done without html entities.
-
Literals removed before replacing - with _:
Previously if you had a "-" in a literal string, it would get replace
miscellaneously by "_".
-
Support for generate-id: Permits easy authoring of
href="#"/id= table of contents. Thanks to Ricardo Garcia
rickg22@users.sourceforge.net
for the RFE and preliminary implementation.
-
Removed call by references: required to parse without error in
PHP 4.3. Patch supplied by Torben Wölm wolm@users.sourceforge.net
Php.XPath Version 3.3 - 7th October 2002
-
sum() bug:
The argument to the sum() function was not being converted to a node set
properly so that the entries could be sum()ed.
-
Better support for empty documents:
if you created a blank document and tried to append nodes to the null docment
then this would create an error.
-
Error checks in insertChild():
to prevent against mutliple root nodes.
-
match() now supports an XPath Query as base node:
Previously you had to explicity mention the path that you wanted to search
from, but now you can provide an XPath-Query that resolves to the node you want
to search from.
-
Support for : in node names:
previously a node name of <A:B> couldn't be matched against.
-
Export without XML header:
exportAsXml('','') will export with no XML header which is useful if your
output is to form part of a target XML doc.
-
name() works for attributes:
didn't previously
-
Nested predictes now work:
getAxis() has been upgraded so that it will support [][] and [[]] nested
predicates.
-
node() works for attributes:
you can now do @node() which will produce the same as @*
-
Return value from insertChild() corrected: The array contained
the correct number of entries, but the path was completely wrong.
Php.XPath Version 3.2 - 20th July 2002
-
Major restructure of XPath code: Through several kind users
bug reports notably Peter Robins phpxpath@peterrobins.co.uk
and Greg Keraunen gk@proliberty.com, it
became clear that the XPath handling wasn't sufficient. On closer inspection
the "top" level syntax token was a
UnionExpr and not a Expr
like it should be. The internal XPath handling has been altered so that this is
now the case. The result of this is that you can now do things like
"count(//*)"
or "name(//*[1])"
which will return numbers or strings rather than node sets.
-
Upgrade to MPL/GPL/LGPL license triple:
to maximise the codes usefulness.
-
Literal strings in XPath "parsed out": Now support XPath
expressions like
//*[Child = 'This = that']
without picking up the "=" as an operator and trying to evaluate left and right
predicates.
-
Support for the node() and text() tests: Greg Keraunen
gk@proliberty.com kindly spotted that our node() test didn't return the
text() nodes, and that the text() test didn't return the text nodes either. So
//text() and //node()
now work. Text nodes still aren't really handled properly, but when we come
round to supporting comment and processing instructions, then the text nodes
should end up being supported properly.
-
* should only match element nodes: The "*" was returning ALL
nodes, when it fact it should match only the element nodes. Thanks again to
Greg Keraunen gk@proliberty.com
for reporting this.
-
Context Pos and Context Size were wrong: which could be seen
by evaluation of last() and [] with certain XPaths. And was certainly even more
noticable when you had double predicates [][]. Thanks to Peter Robins
phpxpath@peterrobins.co.uk
for raising this one.
-
Node set results are in doc order:
The XPath spec doesn't demand this, but it is useful if any XPath results are
in doc order rather than in any random order.
-
More reluctance to override your xml header:
The class now tries much harder to reuse the xmlHeader from the source or the
one that you supplied to overwride it. Only uses a default if there was none
available from either the source or the override.
-
Many other small upgrades and fixes: Too numerous and boring
to list, but things that make the class more robust and more resiliant to
error.
Php.XPath Version 3.0 - 8th May 2002
-
Major internal re-structure
allowing the DOM functions to be significantly faster. The whole component is
now in general better engineered and more consistent.
-
Facilities to speed multiple dom accesses
reindexNodeTree() function allows advanced users to optimize performance by
making several operations before manually asking for the internal
house-keeping. By default calling reindexNodeTree() is completely unnecessary.
-
Performance optimizations in import and export
reading XML files is now 50% faster, and exporting is 40% faster check it out
for yourself at the php.xpath homepage.
-
New functions:
setModMatch(), insertChild(), appendChild(), insertBefore(), decodeEntities(),
getNode(), reindexNodeTree(), cloneNode(), equalNodes(), getNodePath(),
getParentXPath()
-
Substantial work on the DOM test harness The DOMTS has
finished its first revision, so we now take our tests from that zip. Many more
tests are now included in the regressional test suite.
Php.XPath Version 2.2 - 2nd February 2002
-
Construction of Dom test harness Given that we are trying to
use DOM style function names, we can hook into the
DomTS
test suite, that is a series of XML files that describe tests. This test
harness is not complete yet, but gives us a basis for regressional testing.
-
New functions:
getProperties(), getLastError(), setVerbose(), wholeText(), insertData(),
replaceChildByData(), replaceChild()
-
Bug fixes:
substringData(), deleteData(), appendData(), export()
-
Remove child
lots of work on this function. Should hopefully be more reliable now, but sadly
it's still quite slow
-
CData support started
You can now losslessly import and export CDATA in and out of the class.
-
DTD and xml declerations
No longer binned, but are not parsed or used at all, just stored and written
back out.
-
/*[last()] fix
was returning more than one node.
-
Return codes Increased effort to return TRUE/FALSE where
possible.
Php.XPath Version 2.1 - 8th October 2001
-
Project rename to Php.XPath Previously known as phpxml, it was
felt that naming the project Php.XPath would emphasise much better what the
project does with XML.
PhpXml Version 1.6 - 4th October 2001
Fixes from this version on are either by Sam or Nigel
unless otherwise specified
-
Dropped the 'N' from the version number
Michael Mehl has "authorised" 1.N.X as the main branch for this class so we
need no longer consider this a branch but the main trunk. Phew!
-
XML errors are handled better
It'll give you some more help if you have bugs in your XML file now
-
Improve the depreciation code
Allowing a two step upgrade to the new DOM interface
-
Changed the class to use the new DOM interface
Randomly the class itself was still using the old interface!!!
-
Small but picky bugs: all over the place really! Thanks go
specifically to Dan bigredlinux@yahoo.com
For many of these.
-
Improved error handling
The class is now more likely to tell you why your call failed. And when it
tells you it will be prettier
-
Major re-write of the removeChild() function. It is much
faster now. Somewhere along the way it had developed some fairly major bugs, ie
when called it would delete the entire $nodes array. oops!
-
export bugs
mainly fairly small things.
-
Fixed $this-> errors: a huge number of embarrisingly minor
errors like "$this->" ommisions.
PhpXml Version 1.N.5 - 25th September 2001
-
Text content is output in correct order: If you have
"<P> A <B> B </B> C </P>" this will be output as
"<P> A <B> B </B> C </P>" not "<P> A C <B>
B </B> </P>" as it used to be before this version. Our huge thanks
go to Sam Blum bs_php@users.sourceforge.net
for this fix.
-
50% speed enhancements on reading and searching: The
add_content function got a serious look over by Sam and has managed to make
seriously visible speed increases. Sam Blum
bs_php@users.sourceforge.net
-
New DOM style user interface
The public interface was renamed to use DOM style function names. This should
make the learning curve easier for newcommers to the class. The old style
interface is still supported, but is unlikely to be in future releases.
-
Class name change to XPath
After some discussion and thought, it seemed that XPath was a much better name
than "XML"" for the class. XML doesn't really tell you what the class does,
whereas XPath describes that it is a class that will allow you to search using
the XPath standard. ie exactly what it is. The modification functions are kinda
added 'cos the tree is already there, but principly this module is about
searching using XPath and retrieving the content.
-
Filename changed to xpath.class.phpThe filename has also
been changed to reflect the change of name for the class
-
Documentation added to the 1.N branch
In the form of a perl script that automatically produces an XML file, that is
combined with an XSL to produce the documentation. This should allow you to
quickly update the documentation when you make your own changes.
-
Stacks of style changes. Re-grouping of functions, comments
change to //, etc, etc.
PhpXml Version 1.N.4 - 23rd July 2001
-
Added debugging to get_file_internal()
-
evaluate_step optimisation:
Optimised the evaluate_step function so that it observed input with an array of
only 1 element. This happens really often as if you search for
/Path1[3]/Path2[6] then you will end up calling evaluate_step with an array
with only 1 element both for Path1 and for Path2.
-
evaluate_predicate bug:
Missing break statements in the switch statement mean that all operators ended
up being evaluated as mod. ie frequent div/0 errors. :o(
-
evaluate_predicate optimisation:
Given that evaluating something like contains(Field, 'substring') ends up with
about 10 function calls, it seems like a good idea not to as often as possible,
so if you search using the and operator, it will not needlessly evaluate arg2
if arg1 is false.
-
check_predicates optimization: Reduced function call count by
3 for /Path[position()] XPath searches.
PhpXml Version 1.N.3 - 19th July 2001
-
get_names() function added: Function to retrieve the names of
a set of nodes. Written by Andrei Zmievski andrei@php.net
-
Nested functions now supported:
phpxml did look for the next ) after the opening ( and considered that to be
the function. Of course when you have nested calls this doesn't work..
Andrei Zmievski noticed and fixed this.
-
XML_OPTION_SKIP_WHITE switched off: If we want to keep spaces
in the CDATA, then we set this to 0. Most people generally want to keep space,
and you can call trim() but there's no untrim() is there? Francis Fillion
investigated this one francisf@videotron.ca.
-
load_string ( $content ): Function now allows you to load the
object form a string instead of a file. Very handy :o) Francis
Fillion francisf@videotron.ca.
wrote this one.
-
Debugging code added: For some functions, you'll find a
$b_debug_this_function = false line at the top of the function. Set this
to true for the functions you are debugging and you'll get a stack trace of the
execution as you go through the function. This method has proved
invaluable to me in worth with other scripts.
PhpXml Version 1.N.2 - 16th July 2001
-
XPath literal strings: XPath literal strings were being
interpreted as XPath expressions.
PhpXml Version 1.N.1
-
Undefined index errors: It includes all of the Marko
Faldix mf@mrinfo.de undefined index
errors, and all of the Tim Strehle tim@digicol.de
undefined index errors. Also some I found too which could be
duplicates just fixed in a different way...
-
&apos
has been removed
-
get_file has been renamed to get_file_as_html as it is a
somewhat misleading name for the function anyway. get_file_as_html() is
implemented by get_file_internal() which is mostly a copy of get_file.
-
get_file_as_xml
This is a function that will return you a buffer that can in it's entirity be
dumped to an output file. it includes the <?xml version... ?> line
too. Very useful function. The support for this was added to
get_file_internal() so that it will quote < > and & only if it is
called from get_file_as_html.
-
stripslashes
I don't bother calling addslashes to all the data that the class holds, as this
just gets really inconvenient. xmlphp should be able to read in from file
and write out as xml without altering the logical content of the data and
addslashes() restricts this. Addslashes is useful for data that you use
in php or javascript strings, but xml can handle ' " and \ so why escape
them? Consequently all addslahes and stripslashes calls have been
removed.
-
Output formating
I've changed the rules for output formatting so as to preserve "content" more
effectively. Consequently if an element has only content, it will be
written as "<Tag>Text</Tag>" as opposed to
"<Tag>\n\tText\n<Tag>\n". This makes for a more compact
output.
-
Whitespace is preserved
When space is read, it is added to the content of the node. It is only
when the end tag is read that whitespace is potentially discarded. Even
then whitespace is only discardedif the entire text content of the node is
whitespace. This strips "meaningless" whitespace that facilitates
formatting of the xml file, while preserving whitespace in text.
-
Boolean predicates didn't work The check_predicates
function didn't work with and and or because
the return value of bool was being interpreted as an integer and being
considered to be an index instead of a true or false. This has been
fixed.