Update libs

This commit is contained in:
Ruud
2013-11-22 16:09:15 +01:00
parent f53364eb6c
commit 88d6148500
119 changed files with 18448 additions and 8923 deletions

View File

@@ -1,143 +0,0 @@
BitTorrent Open Source License
Version 1.1
This BitTorrent Open Source License (the "License") applies to the BitTorrent client and related software products as well as any updates or maintenance releases of that software ("BitTorrent Products") that are distributed by BitTorrent, Inc. ("Licensor"). Any BitTorrent Product licensed pursuant to this License is a Licensed Product. Licensed Product, in its entirety, is protected by U.S. copyright law. This License identifies the terms under which you may use, copy, distribute or modify Licensed Product.
Preamble
This Preamble is intended to describe, in plain English, the nature and scope of this License. However, this Preamble is not a part of this license. The legal effect of this License is dependent only upon the terms of the License and not this Preamble.
This License complies with the Open Source Definition and is derived from the Jabber Open Source License 1.0 (the "JOSL"), which has been approved by Open Source Initiative. Sections 4(c) and 4(f)(iii) from the JOSL have been deleted.
This License provides that:
1. You may use or give away the Licensed Product, alone or as a component of an aggregate software distribution containing programs from several different sources. No royalty or other fee is required.
2. Both Source Code and executable versions of the Licensed Product, including Modifications made by previous Contributors, are available for your use. (The terms "Licensed Product," "Modifications," "Contributors" and "Source Code" are defined in the License.)
3. You are allowed to make Modifications to the Licensed Product, and you can create Derivative Works from it. (The term "Derivative Works" is defined in the License.)
4. By accepting the Licensed Product under the provisions of this License, you agree that any Modifications you make to the Licensed Product and then distribute are governed by the provisions of this License. In particular, you must make the Source Code of your Modifications available to others free of charge and without a royalty.
5. You may sell, accept donations or otherwise receive compensation for executable versions of a Licensed Product, without paying a royalty or other fee to the Licensor or any Contributor, provided that such executable versions contain your or another Contributor?s material Modifications. For the avoidance of doubt, to the extent your executable version of a Licensed Product does not contain your or another Contributor?s material Modifications, you may not sell, accept donations or otherwise receive compensation for such executable.
You may use the Licensed Product for any purpose, but the Licensor is not providing you any warranty whatsoever, nor is the Licensor accepting any liability in the event that the Licensed Product doesn't work properly or causes you any injury or damages.
6. If you sublicense the Licensed Product or Derivative Works, you may charge fees for warranty or support, or for accepting indemnity or liability obligations to your customers. You cannot charge for, sell, accept donations or otherwise receive compensation for the Source Code.
7. If you assert any patent claims against the Licensor relating to the Licensed Product, or if you breach any terms of the License, your rights to the Licensed Product under this License automatically terminate.
You may use this License to distribute your own Derivative Works, in which case the provisions of this License will apply to your Derivative Works just as they do to the original Licensed Product.
Alternatively, you may distribute your Derivative Works under any other OSI-approved Open Source license, or under a proprietary license of your choice. If you use any license other than this License, however, you must continue to fulfill the requirements of this License (including the provisions relating to publishing the Source Code) for those portions of your Derivative Works that consist of the Licensed Product, including the files containing Modifications.
New versions of this License may be published from time to time in connection with new versions of a Licensed Product or otherwise. You may choose to continue to use the license terms in this version of the License for the Licensed Product that was originally licensed hereunder, however, the new versions of this License will at all times apply to new versions of the Licensed Product released by Licensor after the release of the new version of this License. Only the Licensor has the right to change the License terms as they apply to the Licensed Product.
This License relies on precise definitions for certain terms. Those terms are defined when they are first used, and the definitions are repeated for your convenience in a Glossary at the end of the License.
License Terms
1. Grant of License From Licensor. Subject to the terms and conditions of this License, Licensor hereby grants you a world-wide, royalty-free, non-exclusive license, subject to third party intellectual property claims, to do the following:
a. Use, reproduce, modify, display, perform, sublicense and distribute any Modifications created by a Contributor or portions thereof, in both Source Code or as an executable program, either on an unmodified basis or as part of Derivative Works.
b. Under claims of patents now or hereafter owned or controlled by Contributor, to make, use, sell, offer for sale, have made, and/or otherwise dispose of Modifications or portions thereof, but solely to the extent that any such claim is necessary to enable you to make, use, sell, offer for sale, have made, and/or otherwise dispose of Modifications or portions thereof or Derivative Works thereof.
2. Grant of License to Modifications From Contributor. "Modifications" means any additions to or deletions from the substance or structure of (i) a file containing a Licensed Product, or (ii) any new file that contains any part of a Licensed Product. Hereinafter in this License, the term "Licensed Product" shall include all previous Modifications that you receive from any Contributor. Subject to the terms and conditions of this License, By application of the provisions in Section 4(a) below, each person or entity who created or contributed to the creation of, and distributed, a Modification (a "Contributor") hereby grants you a world-wide, royalty-free, non-exclusive license, subject to third party intellectual property claims, to do the following:
a. Use, reproduce, modify, display, perform, sublicense and distribute any Modifications created by such Contributor or portions thereof, in both Source Code or as an executable program, either on an unmodified basis or as part of Derivative Works.
b. Under claims of patents now or hereafter owned or controlled by Contributor, to make, use, sell, offer for sale, have made, and/or otherwise dispose of Modifications or portions thereof, but solely to the extent that any such claim is necessary to enable you to make, use, sell, offer for sale, have made, and/or otherwise dispose of Modifications or portions thereof or Derivative Works thereof.
3. Exclusions From License Grant. Nothing in this License shall be deemed to grant any rights to trademarks, copyrights, patents, trade secrets or any other intellectual property of Licensor or any Contributor except as expressly stated herein. No patent license is granted separate from the Licensed Product, for code that you delete from the Licensed Product, or for combinations of the Licensed Product with other software or hardware. No right is granted to the trademarks of Licensor or any Contributor even if such marks are included in the Licensed Product. Nothing in this License shall be interpreted to prohibit Licensor from licensing under different terms from this License any code that Licensor otherwise would have a right to license. As an express condition for your use of the Licensed Product, you hereby agree that you will not, without the prior written consent of Licensor, use any trademarks, copyrights, patents, trade secrets or any other intellectual property of Licensor or any Contributor except as expressly stated herein. For the avoidance of doubt and without limiting the foregoing, you hereby agree that you will not use or display any trademark of Licensor or any Contributor in any domain name, directory filepath, advertisement, link or other reference to you in any manner or in any media.
4. Your Obligations Regarding Distribution.
a. Application of This License to Your Modifications. As an express condition for your use of the Licensed Product, you hereby agree that any Modifications that you create or to which you contribute, and which you distribute, are governed by the terms of this License including, without limitation, Section 2. Any Modifications that you create or to which you contribute may be distributed only under the terms of this License or a future version of this License released under Section 7. You must include a copy of this License with every copy of the Modifications you distribute. You agree not to offer or impose any terms on any Source Code or executable version of the Licensed Product or Modifications that alter or restrict the applicable version of this License or the recipients' rights hereunder. However, you may include an additional document offering the additional rights described in Section 4(d).
b. Availability of Source Code. You must make available, without charge, under the terms of this License, the Source Code of the Licensed Product and any Modifications that you distribute, either on the same media as you distribute any executable or other form of the Licensed Product, or via a mechanism generally accepted in the software development community for the electronic transfer of data (an "Electronic Distribution Mechanism"). The Source Code for any version of Licensed Product or Modifications that you distribute must remain available for as long as any executable or other form of the Licensed Product is distributed by you. You are responsible for ensuring that the Source Code version remains available even if the Electronic Distribution Mechanism is maintained by a third party.
c. Intellectual Property Matters.
i. Third Party Claims. If you have knowledge that a license to a third party's intellectual property right is required to exercise the rights granted by this License, you must include a text file with the Source Code distribution titled "LEGAL" that describes the claim and the party making the claim in sufficient detail that a recipient will know whom to contact. If you obtain such knowledge after you make any Modifications available as described in Section 4(b), you shall promptly modify the LEGAL file in all copies you make available thereafter and shall take other steps (such as notifying appropriate mailing lists or newsgroups) reasonably calculated to inform those who received the Licensed Product from you that new knowledge has been obtained.
ii. Contributor APIs. If your Modifications include an application programming interface ("API") and you have knowledge of patent licenses that are reasonably necessary to implement that API, you must also include this information in the LEGAL file.
iii. Representations. You represent that, except as disclosed pursuant to 4(c)(i) above, you believe that any Modifications you distribute are your original creations and that you have sufficient rights to grant the rights conveyed by this License.
d. Required Notices. You must duplicate this License in any documentation you provide along with the Source Code of any Modifications you create or to which you contribute, and which you distribute, wherever you describe recipients' rights relating to Licensed Product. You must duplicate the notice contained in Exhibit A (the "Notice") in each file of the Source Code of any copy you distribute of the Licensed Product. If you created a Modification, you may add your name as a Contributor to the Notice. If it is not possible to put the Notice in a particular Source Code file due to its structure, then you must include such Notice in a location (such as a relevant directory file) where a user would be likely to look for such a notice. You may choose to offer, and charge a fee for, warranty, support, indemnity or liability obligations to one or more recipients of Licensed Product. However, you may do so only on your own behalf, and not on behalf of the Licensor or any Contributor. You must make it clear that any such warranty, support, indemnity or liability obligation is offered by you alone, and you hereby agree to indemnify the Licensor and every Contributor for any liability incurred by the Licensor or such Contributor as a result of warranty, support, indemnity or liability terms you offer.
e. Distribution of Executable Versions. You may distribute Licensed Product as an executable program under a license of your choice that may contain terms different from this License provided (i) you have satisfied the requirements of Sections 4(a) through 4(d) for that distribution, (ii) you include a conspicuous notice in the executable version, related documentation and collateral materials stating that the Source Code version of the
Licensed Product is available under the terms of this License, including a description of how and where you have fulfilled the obligations of Section 4(b), and (iii) you make it clear that any terms that differ from this License are offered by you alone, not by Licensor or any Contributor. You hereby agree to indemnify the Licensor and every Contributor for any liability incurred by Licensor or such Contributor as a result of any terms you offer.
f. Distribution of Derivative Works. You may create Derivative Works (e.g., combinations of some or all of the Licensed Product with other code) and distribute the Derivative Works as products under any other license you select, with the proviso that the requirements of this License are fulfilled for those portions of the Derivative Works that consist of the Licensed Product or any Modifications thereto.
g. Compensation for Distribution of Executable Versions of Licensed Products, Modifications or Derivative Works. Notwithstanding any provision of this License to the contrary, by distributing, selling, licensing, sublicensing or otherwise making available any Licensed Product, or Modification or Derivative Work thereof, you and Licensor hereby acknowledge and agree that you may sell, license or sublicense for a fee, accept donations or otherwise receive compensation for executable versions of a Licensed Product, without paying a royalty or other fee to the Licensor or any other Contributor, provided that such executable versions (i) contain your or another Contributor?s material Modifications, or (ii) are otherwise material Derivative Works. For purposes of this License, an executable version of the Licensed Product will be deemed to contain a material Modification, or will otherwise be deemed a material Derivative Work, if (a) the Licensed Product is modified with your own or a third party?s software programs or other code, and/or the Licensed Product is combined with a number of your own or a third party?s software programs or code, respectively, and (b) such software programs or code add or contribute material value, functionality or features to the License Product. For the avoidance of doubt, to the extent your executable version of a Licensed Product does not contain your or another Contributor?s material Modifications or is otherwise not a material Derivative Work, in each case as contemplated herein, you may not sell, license or sublicense for a fee, accept donations or otherwise receive compensation for such executable. Additionally, without limitation of the foregoing and notwithstanding any provision of this License to the contrary, you cannot charge for, sell, license or sublicense for a fee, accept donations or otherwise receive compensation for the Source Code.
5. Inability to Comply Due to Statute or Regulation. If it is impossible for you to comply with any of the terms of this License with respect to some or all of the Licensed Product due to statute, judicial order, or regulation, then you must (i) comply with the terms of this License to the maximum extent possible, (ii) cite the statute or regulation that prohibits you from adhering to the License, and (iii) describe the limitations and the code they affect. Such description must be included in the LEGAL file described in Section 4(d), and must be included with all distributions of the Source Code. Except to the extent prohibited by statute or regulation, such description must be sufficiently detailed for a recipient of ordinary skill at computer programming to be able to understand it.
6. Application of This License. This License applies to code to which Licensor or Contributor has attached the Notice in Exhibit A, which is incorporated herein by this reference.
7. Versions of This License.
a. New Versions. Licensor may publish from time to time revised and/or new versions of the License.
b. Effect of New Versions. Once Licensed Product has been published under a particular version of the License, you may always continue to use it under the terms of that version, provided that any such license be in full force and effect at the time, and has not been revoked or otherwise terminated. You may also choose to use such Licensed Product under the terms of any subsequent version (but not any prior version) of the License published by Licensor. No one other than Licensor has the right to modify the terms applicable to Licensed Product created under this License.
c. Derivative Works of this License. If you create or use a modified version of this License, which you may do only in order to apply it to software that is not already a Licensed Product under this License, you must rename your license so that it is not confusingly similar to this License, and must make it clear that your license contains terms that differ from this License. In so naming your license, you may not use any trademark of Licensor or any Contributor.
8. Disclaimer of Warranty. LICENSED PRODUCT IS PROVIDED UNDER THIS LICENSE ON AN AS IS BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES THAT THE LICENSED PRODUCT IS FREE OF DEFECTS, MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE OR NON-INFRINGING. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LICENSED PRODUCT IS WITH YOU. SHOULD LICENSED PRODUCT PROVE DEFECTIVE IN ANY RESPECT, YOU (AND NOT THE LICENSOR OR ANY OTHER CONTRIBUTOR) ASSUME THE COST OF ANY NECESSARY SERVICING, REPAIR OR CORRECTION. THIS
DISCLAIMER OF WARRANTY CONSTITUTES AN ESSENTIAL PART OF THIS LICENSE. NO USE OF LICENSED PRODUCT IS AUTHORIZED HEREUNDER EXCEPT UNDER THIS DISCLAIMER.
9. Termination.
a. Automatic Termination Upon Breach. This license and the rights granted hereunder will terminate automatically if you fail to comply with the terms herein and fail to cure such breach within ten (10) days of being notified of the breach by the Licensor. For purposes of this provision, proof of delivery via email to the address listed in the ?WHOIS? database of the registrar for any website through which you distribute or market any Licensed Product, or to any alternate email address which you designate in writing to the Licensor, shall constitute sufficient notification. All sublicenses to the Licensed Product that are properly granted shall survive any termination of this license so long as they continue to complye with the terms of this License. Provisions that, by their nature, must remain in effect beyond the termination of this License, shall survive.
b. Termination Upon Assertion of Patent Infringement. If you initiate litigation by asserting a patent infringement claim (excluding declaratory judgment actions) against Licensor or a Contributor (Licensor or Contributor against whom you file such an action is referred to herein as Respondent) alleging that Licensed Product directly or indirectly infringes any patent, then any and all rights granted by such Respondent to you under Sections 1 or 2 of this License shall terminate prospectively upon sixty (60) days notice from Respondent (the "Notice Period") unless within that Notice Period you either agree in writing (i) to pay Respondent a mutually agreeable reasonably royalty for your past or future use of Licensed Product made by such Respondent, or (ii) withdraw your litigation claim with respect to Licensed Product against such Respondent. If within said Notice Period a reasonable royalty and payment arrangement are not mutually agreed upon in writing by the parties or the litigation claim is not withdrawn, the rights granted by Licensor to you under Sections 1 and 2 automatically terminate at the expiration of said Notice Period.
c. Reasonable Value of This License. If you assert a patent infringement claim against Respondent alleging that Licensed Product directly or indirectly infringes any patent where such claim is resolved (such as by license or settlement) prior to the initiation of patent infringement litigation, then the reasonable value of the licenses granted by said Respondent under Sections 1 and 2 shall be taken into account in determining the amount or value of any payment or license.
d. No Retroactive Effect of Termination. In the event of termination under Sections 9(a) or 9(b) above, all end user license agreements (excluding licenses to distributors and resellers) that have been validly granted by you or any distributor hereunder prior to termination shall survive termination.
10. Limitation of Liability. UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, WHETHER TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE, SHALL THE LICENSOR, ANY CONTRIBUTOR, OR ANY DISTRIBUTOR OF LICENSED PRODUCT, OR ANY SUPPLIER OF ANY OF SUCH PARTIES, BE LIABLE TO ANY PERSON FOR ANY INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF GOODWILL, WORK STOPPAGE, COMPUTER FAILURE OR MALFUNCTION, OR ANY AND ALL OTHER COMMERCIAL DAMAGES OR LOSSES, EVEN IF SUCH PARTY SHALL HAVE BEEN INFORMED OF THE POSSIBILITY OF SUCH DAMAGES. THIS LIMITATION OF LIABILITY SHALL NOT APPLY TO LIABILITY FOR DEATH OR PERSONAL INJURY RESULTING FROM SUCH PARTYS NEGLIGENCE TO THE EXTENT APPLICABLE LAW PROHIBITS SUCH LIMITATION. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION OF INCIDENTAL OR CONSEQUENTIAL DAMAGES, SO THIS EXCLUSION AND LIMITATION MAY NOT APPLY TO YOU.
11. Responsibility for Claims. As between Licensor and Contributors, each party is responsible for claims and damages arising, directly or indirectly, out of its utilization of rights under this License. You agree to work with Licensor and Contributors to distribute such responsibility on an equitable basis. Nothing herein is intended or shall be deemed to constitute any admission of liability.
12. U.S. Government End Users. The Licensed Product is a commercial item, as that term is defined in 48 C.F.R. 2.101 (Oct. 1995), consisting of commercial computer software and commercial computer software documentation, as such terms are used in 48 C.F.R. 12.212 (Sept. 1995). Consistent with 48 C.F.R. 12.212 and 48 C.F.R. 227.7202-1 through 227.7202-4 (June 1995), all U.S. Government End Users acquire Licensed Product with only those rights set forth herein.
13. Miscellaneous. This License represents the complete agreement concerning the subject matter hereof. If any provision of this License is held to be unenforceable, such provision shall be reformed only to the extent necessary to make it enforceable. This License shall be governed by California law provisions (except to the extent applicable law, if any, provides otherwise), excluding its conflict-of-law provisions. You expressly agree that in any litigation relating to this license the losing party shall be responsible for costs including, without limitation, court costs and reasonable attorneys fees and expenses. The application of the United Nations Convention on Contracts for the International Sale of Goods is expressly excluded. Any law or regulation that provides that the language of a contract shall be construed against the drafter shall not apply to this License.
14. Definition of You in This License. You throughout this License, whether in upper or lower case, means an individual or a legal entity exercising rights under, and complying with all of the terms of, this License or a future version of this License issued under Section 7. For legal entities, you includes any entity that controls, is controlled by, is under common control with, or affiliated with, you. For purposes of this definition, control means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. You are responsible for advising any affiliated entity of the terms of this License, and that any rights or privileges derived from or obtained by way of this License are subject to the restrictions outlined herein.
15. Glossary. All defined terms in this License that are used in more than one Section of this License are repeated here, in alphabetical order, for the convenience of the reader. The Section of this License in which each defined term is first used is shown in parentheses.
Contributor: Each person or entity who created or contributed to the creation of, and distributed, a Modification. (See Section 2)
Derivative Works: That term as used in this License is defined under U.S. copyright law. (See Section 1(b))
License: This BitTorrent Open Source License. (See first paragraph of License)
Licensed Product: Any BitTorrent Product licensed pursuant to this License. The term "Licensed Product" includes all previous Modifications from any Contributor that you receive. (See first paragraph of License and Section 2)
Licensor: BitTorrent, Inc. (See first paragraph of License)
Modifications: Any additions to or deletions from the substance or structure of (i) a file containing Licensed Product, or (ii) any new file that contains any part of Licensed Product. (See Section 2)
Notice: The notice contained in Exhibit A. (See Section 4(e))
Source Code: The preferred form for making modifications to the Licensed Product, including all modules contained therein, plus any associated interface definition files, scripts used to control compilation and installation of an executable program, or a list of differential comparisons against the Source Code of the Licensed Product. (See Section 1(a))
You: This term is defined in Section 14 of this License.
EXHIBIT A
The Notice below must appear in each file of the Source Code of any copy you distribute of the Licensed Product or any hereto. Contributors to any Modifications may add their own copyright notices to identify their own contributions.
License:
The contents of this file are subject to the BitTorrent Open Source License Version 1.0 (the License). You may not copy or use this file, in either source code or executable form, except in compliance with the License. You may obtain a copy of the License at http://www.bittorrent.com/license/.
Software distributed under the License is distributed on an AS IS basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License.

View File

@@ -1 +1,131 @@
from bencode import *
# The contents of this file are subject to the BitTorrent Open Source License
# Version 1.1 (the License). You may not copy or use this file, in either
# source code or executable form, except in compliance with the License. You
# may obtain a copy of the License at http://www.bittorrent.com/license/.
#
# Software distributed under the License is distributed on an AS IS basis,
# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
# for the specific language governing rights and limitations under the
# License.
# Written by Petru Paler
from BTL import BTFailure
def decode_int(x, f):
f += 1
newf = x.index('e', f)
n = int(x[f:newf])
if x[f] == '-':
if x[f + 1] == '0':
raise ValueError
elif x[f] == '0' and newf != f+1:
raise ValueError
return (n, newf+1)
def decode_string(x, f):
colon = x.index(':', f)
n = int(x[f:colon])
if x[f] == '0' and colon != f+1:
raise ValueError
colon += 1
return (x[colon:colon+n], colon+n)
def decode_list(x, f):
r, f = [], f+1
while x[f] != 'e':
v, f = decode_func[x[f]](x, f)
r.append(v)
return (r, f + 1)
def decode_dict(x, f):
r, f = {}, f+1
while x[f] != 'e':
k, f = decode_string(x, f)
r[k], f = decode_func[x[f]](x, f)
return (r, f + 1)
decode_func = {}
decode_func['l'] = decode_list
decode_func['d'] = decode_dict
decode_func['i'] = decode_int
decode_func['0'] = decode_string
decode_func['1'] = decode_string
decode_func['2'] = decode_string
decode_func['3'] = decode_string
decode_func['4'] = decode_string
decode_func['5'] = decode_string
decode_func['6'] = decode_string
decode_func['7'] = decode_string
decode_func['8'] = decode_string
decode_func['9'] = decode_string
def bdecode(x):
try:
r, l = decode_func[x[0]](x, 0)
except (IndexError, KeyError, ValueError):
raise BTFailure("not a valid bencoded string")
if l != len(x):
raise BTFailure("invalid bencoded value (data after valid prefix)")
return r
from types import StringType, IntType, LongType, DictType, ListType, TupleType
class Bencached(object):
__slots__ = ['bencoded']
def __init__(self, s):
self.bencoded = s
def encode_bencached(x,r):
r.append(x.bencoded)
def encode_int(x, r):
r.extend(('i', str(x), 'e'))
def encode_bool(x, r):
if x:
encode_int(1, r)
else:
encode_int(0, r)
def encode_string(x, r):
r.extend((str(len(x)), ':', x))
def encode_list(x, r):
r.append('l')
for i in x:
encode_func[type(i)](i, r)
r.append('e')
def encode_dict(x,r):
r.append('d')
ilist = x.items()
ilist.sort()
for k, v in ilist:
r.extend((str(len(k)), ':', k))
encode_func[type(v)](v, r)
r.append('e')
encode_func = {}
encode_func[Bencached] = encode_bencached
encode_func[IntType] = encode_int
encode_func[LongType] = encode_int
encode_func[StringType] = encode_string
encode_func[ListType] = encode_list
encode_func[TupleType] = encode_list
encode_func[DictType] = encode_dict
try:
from types import BooleanType
encode_func[BooleanType] = encode_bool
except ImportError:
pass
def bencode(x):
r = []
encode_func[type(x)](x, r)
return ''.join(r)

View File

@@ -1,131 +0,0 @@
# The contents of this file are subject to the BitTorrent Open Source License
# Version 1.1 (the License). You may not copy or use this file, in either
# source code or executable form, except in compliance with the License. You
# may obtain a copy of the License at http://www.bittorrent.com/license/.
#
# Software distributed under the License is distributed on an AS IS basis,
# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
# for the specific language governing rights and limitations under the
# License.
# Written by Petru Paler
from BTL import BTFailure
def decode_int(x, f):
f += 1
newf = x.index('e', f)
n = int(x[f:newf])
if x[f] == '-':
if x[f + 1] == '0':
raise ValueError
elif x[f] == '0' and newf != f+1:
raise ValueError
return (n, newf+1)
def decode_string(x, f):
colon = x.index(':', f)
n = int(x[f:colon])
if x[f] == '0' and colon != f+1:
raise ValueError
colon += 1
return (x[colon:colon+n], colon+n)
def decode_list(x, f):
r, f = [], f+1
while x[f] != 'e':
v, f = decode_func[x[f]](x, f)
r.append(v)
return (r, f + 1)
def decode_dict(x, f):
r, f = {}, f+1
while x[f] != 'e':
k, f = decode_string(x, f)
r[k], f = decode_func[x[f]](x, f)
return (r, f + 1)
decode_func = {}
decode_func['l'] = decode_list
decode_func['d'] = decode_dict
decode_func['i'] = decode_int
decode_func['0'] = decode_string
decode_func['1'] = decode_string
decode_func['2'] = decode_string
decode_func['3'] = decode_string
decode_func['4'] = decode_string
decode_func['5'] = decode_string
decode_func['6'] = decode_string
decode_func['7'] = decode_string
decode_func['8'] = decode_string
decode_func['9'] = decode_string
def bdecode(x):
try:
r, l = decode_func[x[0]](x, 0)
except (IndexError, KeyError, ValueError):
raise BTFailure("not a valid bencoded string")
if l != len(x):
raise BTFailure("invalid bencoded value (data after valid prefix)")
return r
from types import StringType, IntType, LongType, DictType, ListType, TupleType
class Bencached(object):
__slots__ = ['bencoded']
def __init__(self, s):
self.bencoded = s
def encode_bencached(x,r):
r.append(x.bencoded)
def encode_int(x, r):
r.extend(('i', str(x), 'e'))
def encode_bool(x, r):
if x:
encode_int(1, r)
else:
encode_int(0, r)
def encode_string(x, r):
r.extend((str(len(x)), ':', x))
def encode_list(x, r):
r.append('l')
for i in x:
encode_func[type(i)](i, r)
r.append('e')
def encode_dict(x,r):
r.append('d')
ilist = x.items()
ilist.sort()
for k, v in ilist:
r.extend((str(len(k)), ':', k))
encode_func[type(v)](v, r)
r.append('e')
encode_func = {}
encode_func[Bencached] = encode_bencached
encode_func[IntType] = encode_int
encode_func[LongType] = encode_int
encode_func[StringType] = encode_string
encode_func[ListType] = encode_list
encode_func[TupleType] = encode_list
encode_func[DictType] = encode_dict
try:
from types import BooleanType
encode_func[BooleanType] = encode_bool
except ImportError:
pass
def bencode(x):
r = []
encode_func[type(x)](x, r)
return ''.join(r)

View File

@@ -17,16 +17,17 @@ http://www.crummy.com/software/BeautifulSoup/bs4/doc/
"""
__author__ = "Leonard Richardson (leonardr@segfault.org)"
__version__ = "4.1.0"
__copyright__ = "Copyright (c) 2004-2012 Leonard Richardson"
__version__ = "4.3.2"
__copyright__ = "Copyright (c) 2004-2013 Leonard Richardson"
__license__ = "MIT"
__all__ = ['BeautifulSoup']
import os
import re
import warnings
from .builder import builder_registry
from .builder import builder_registry, ParserRejectedMarkup
from .dammit import UnicodeDammit
from .element import (
CData,
@@ -74,11 +75,7 @@ class BeautifulSoup(Tag):
# want, look for one with these features.
DEFAULT_BUILDER_FEATURES = ['html', 'fast']
# Used when determining whether a text node is all whitespace and
# can be replaced with a single space. A text node that contains
# fancy Unicode spaces (usually non-breaking) should be left
# alone.
STRIP_ASCII_SPACES = {9: None, 10: None, 12: None, 13: None, 32: None, }
ASCII_SPACES = '\x20\x0a\x09\x0c\x0d'
def __init__(self, markup="", features=None, builder=None,
parse_only=None, from_encoding=None, **kwargs):
@@ -149,7 +146,7 @@ class BeautifulSoup(Tag):
features = self.DEFAULT_BUILDER_FEATURES
builder_class = builder_registry.lookup(*features)
if builder_class is None:
raise ValueError(
raise FeatureNotFound(
"Couldn't find a tree builder with the features you "
"requested: %s. Do you need to install a parser library?"
% ",".join(features))
@@ -160,18 +157,46 @@ class BeautifulSoup(Tag):
self.parse_only = parse_only
self.reset()
if hasattr(markup, 'read'): # It's a file-type object.
markup = markup.read()
(self.markup, self.original_encoding, self.declared_html_encoding,
self.contains_replacement_characters) = (
self.builder.prepare_markup(markup, from_encoding))
elif len(markup) <= 256:
# Print out warnings for a couple beginner problems
# involving passing non-markup to Beautiful Soup.
# Beautiful Soup will still parse the input as markup,
# just in case that's what the user really wants.
if (isinstance(markup, unicode)
and not os.path.supports_unicode_filenames):
possible_filename = markup.encode("utf8")
else:
possible_filename = markup
is_file = False
try:
is_file = os.path.exists(possible_filename)
except Exception, e:
# This is almost certainly a problem involving
# characters not valid in filenames on this
# system. Just let it go.
pass
if is_file:
warnings.warn(
'"%s" looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.' % markup)
if markup[:5] == "http:" or markup[:6] == "https:":
# TODO: This is ugly but I couldn't get it to work in
# Python 3 otherwise.
if ((isinstance(markup, bytes) and not b' ' in markup)
or (isinstance(markup, unicode) and not u' ' in markup)):
warnings.warn(
'"%s" looks like a URL. Beautiful Soup is not an HTTP client. You should probably use an HTTP client to get the document behind the URL, and feed that document to Beautiful Soup.' % markup)
try:
self._feed()
except StopParsing:
pass
for (self.markup, self.original_encoding, self.declared_html_encoding,
self.contains_replacement_characters) in (
self.builder.prepare_markup(markup, from_encoding)):
self.reset()
try:
self._feed()
break
except ParserRejectedMarkup:
pass
# Clear out the markup and remove the builder's circular
# reference to this object.
@@ -192,29 +217,32 @@ class BeautifulSoup(Tag):
Tag.__init__(self, self, self.builder, self.ROOT_TAG_NAME)
self.hidden = 1
self.builder.reset()
self.currentData = []
self.current_data = []
self.currentTag = None
self.tagStack = []
self.preserve_whitespace_tag_stack = []
self.pushTag(self)
def new_tag(self, name, namespace=None, nsprefix=None, **attrs):
"""Create a new tag associated with this soup."""
return Tag(None, self.builder, name, namespace, nsprefix, attrs)
def new_string(self, s):
def new_string(self, s, subclass=NavigableString):
"""Create a new NavigableString associated with this soup."""
navigable = NavigableString(s)
navigable = subclass(s)
navigable.setup()
return navigable
def insert_before(self, successor):
raise ValueError("BeautifulSoup objects don't support insert_before().")
raise NotImplementedError("BeautifulSoup objects don't support insert_before().")
def insert_after(self, successor):
raise ValueError("BeautifulSoup objects don't support insert_after().")
raise NotImplementedError("BeautifulSoup objects don't support insert_after().")
def popTag(self):
tag = self.tagStack.pop()
if self.preserve_whitespace_tag_stack and tag == self.preserve_whitespace_tag_stack[-1]:
self.preserve_whitespace_tag_stack.pop()
#print "Pop", tag.name
if self.tagStack:
self.currentTag = self.tagStack[-1]
@@ -226,32 +254,49 @@ class BeautifulSoup(Tag):
self.currentTag.contents.append(tag)
self.tagStack.append(tag)
self.currentTag = self.tagStack[-1]
if tag.name in self.builder.preserve_whitespace_tags:
self.preserve_whitespace_tag_stack.append(tag)
def endData(self, containerClass=NavigableString):
if self.currentData:
currentData = u''.join(self.currentData)
if (currentData.translate(self.STRIP_ASCII_SPACES) == '' and
not set([tag.name for tag in self.tagStack]).intersection(
self.builder.preserve_whitespace_tags)):
if '\n' in currentData:
currentData = '\n'
else:
currentData = ' '
self.currentData = []
if self.current_data:
current_data = u''.join(self.current_data)
# If whitespace is not preserved, and this string contains
# nothing but ASCII spaces, replace it with a single space
# or newline.
if not self.preserve_whitespace_tag_stack:
strippable = True
for i in current_data:
if i not in self.ASCII_SPACES:
strippable = False
break
if strippable:
if '\n' in current_data:
current_data = '\n'
else:
current_data = ' '
# Reset the data collector.
self.current_data = []
# Should we add this string to the tree at all?
if self.parse_only and len(self.tagStack) <= 1 and \
(not self.parse_only.text or \
not self.parse_only.search(currentData)):
not self.parse_only.search(current_data)):
return
o = containerClass(currentData)
o = containerClass(current_data)
self.object_was_parsed(o)
def object_was_parsed(self, o):
def object_was_parsed(self, o, parent=None, most_recent_element=None):
"""Add an object to the parse tree."""
o.setup(self.currentTag, self.previous_element)
if self.previous_element:
self.previous_element.next_element = o
self.previous_element = o
self.currentTag.contents.append(o)
parent = parent or self.currentTag
most_recent_element = most_recent_element or self._most_recent_element
o.setup(parent, most_recent_element)
if most_recent_element is not None:
most_recent_element.next_element = o
self._most_recent_element = o
parent.contents.append(o)
def _popToTag(self, name, nsprefix=None, inclusivePop=True):
"""Pops the tag stack up to and including the most recent
@@ -260,22 +305,21 @@ class BeautifulSoup(Tag):
the given tag."""
#print "Popping to %s" % name
if name == self.ROOT_TAG_NAME:
# The BeautifulSoup object itself can never be popped.
return
numPops = 0
mostRecentTag = None
most_recently_popped = None
for i in range(len(self.tagStack) - 1, 0, -1):
if (name == self.tagStack[i].name
and nsprefix == self.tagStack[i].nsprefix == nsprefix):
numPops = len(self.tagStack) - i
stack_size = len(self.tagStack)
for i in range(stack_size - 1, 0, -1):
t = self.tagStack[i]
if (name == t.name and nsprefix == t.prefix):
if inclusivePop:
most_recently_popped = self.popTag()
break
if not inclusivePop:
numPops = numPops - 1
most_recently_popped = self.popTag()
for i in range(0, numPops):
mostRecentTag = self.popTag()
return mostRecentTag
return most_recently_popped
def handle_starttag(self, name, namespace, nsprefix, attrs):
"""Push a start tag on to the stack.
@@ -295,12 +339,12 @@ class BeautifulSoup(Tag):
return None
tag = Tag(self, self.builder, name, namespace, nsprefix, attrs,
self.currentTag, self.previous_element)
self.currentTag, self._most_recent_element)
if tag is None:
return tag
if self.previous_element:
self.previous_element.next_element = tag
self.previous_element = tag
if self._most_recent_element:
self._most_recent_element.next_element = tag
self._most_recent_element = tag
self.pushTag(tag)
return tag
@@ -310,7 +354,7 @@ class BeautifulSoup(Tag):
self._popToTag(name, nsprefix)
def handle_data(self, data):
self.currentData.append(data)
self.current_data.append(data)
def decode(self, pretty_print=False,
eventual_encoding=DEFAULT_OUTPUT_ENCODING,
@@ -333,6 +377,10 @@ class BeautifulSoup(Tag):
return prefix + super(BeautifulSoup, self).decode(
indent_level, eventual_encoding, formatter)
# Alias to make it easier to type import: 'from bs4 import _soup'
_s = BeautifulSoup
_soup = BeautifulSoup
class BeautifulStoneSoup(BeautifulSoup):
"""Deprecated interface to an XML parser."""
@@ -347,6 +395,9 @@ class BeautifulStoneSoup(BeautifulSoup):
class StopParsing(Exception):
pass
class FeatureNotFound(ValueError):
pass
#By default, act as an HTML pretty-printer.
if __name__ == '__main__':

View File

@@ -147,18 +147,29 @@ class TreeBuilder(object):
Modifies its input in place.
"""
if not attrs:
return attrs
if self.cdata_list_attributes:
universal = self.cdata_list_attributes.get('*', [])
tag_specific = self.cdata_list_attributes.get(
tag_name.lower(), [])
for cdata_list_attr in itertools.chain(universal, tag_specific):
if cdata_list_attr in dict(attrs):
# Basically, we have a "class" attribute whose
# value is a whitespace-separated list of CSS
# classes. Split it into a list.
value = attrs[cdata_list_attr]
values = whitespace_re.split(value)
attrs[cdata_list_attr] = values
tag_name.lower(), None)
for attr in attrs.keys():
if attr in universal or (tag_specific and attr in tag_specific):
# We have a "class"-type attribute whose string
# value is a whitespace-separated list of
# values. Split it into a list.
value = attrs[attr]
if isinstance(value, basestring):
values = whitespace_re.split(value)
else:
# html5lib sometimes calls setAttributes twice
# for the same tag when rearranging the parse
# tree. On the second call the attribute value
# here is already a list. If this happens,
# leave the value alone rather than trying to
# split it again.
values = value
attrs[attr] = values
return attrs
class SAXTreeBuilder(TreeBuilder):
@@ -287,6 +298,9 @@ def register_treebuilders_from(module):
# Register the builder while we're at it.
this_module.builder_registry.register(obj)
class ParserRejectedMarkup(Exception):
pass
# Builders are registered in reverse order of priority, so that custom
# builder registrations will take precedence. In general, we want lxml
# to take precedence over html5lib, because it's faster. And we only

View File

@@ -27,7 +27,7 @@ class HTML5TreeBuilder(HTMLTreeBuilder):
def prepare_markup(self, markup, user_specified_encoding):
# Store the user-specified encoding for use later on.
self.user_specified_encoding = user_specified_encoding
return markup, None, None, False
yield (markup, None, None, False)
# These methods are defined by Beautiful Soup.
def feed(self, markup):
@@ -123,17 +123,50 @@ class Element(html5lib.treebuilders._base.Node):
self.namespace = namespace
def appendChild(self, node):
if (node.element.__class__ == NavigableString and self.element.contents
string_child = child = None
if isinstance(node, basestring):
# Some other piece of code decided to pass in a string
# instead of creating a TextElement object to contain the
# string.
string_child = child = node
elif isinstance(node, Tag):
# Some other piece of code decided to pass in a Tag
# instead of creating an Element object to contain the
# Tag.
child = node
elif node.element.__class__ == NavigableString:
string_child = child = node.element
else:
child = node.element
if not isinstance(child, basestring) and child.parent is not None:
node.element.extract()
if (string_child and self.element.contents
and self.element.contents[-1].__class__ == NavigableString):
# Concatenate new text onto old text node
# XXX This has O(n^2) performance, for input like
# We are appending a string onto another string.
# TODO This has O(n^2) performance, for input like
# "a</a>a</a>a</a>..."
old_element = self.element.contents[-1]
new_element = self.soup.new_string(old_element + node.element)
new_element = self.soup.new_string(old_element + string_child)
old_element.replace_with(new_element)
self.soup._most_recent_element = new_element
else:
self.element.append(node.element)
node.parent = self
if isinstance(node, basestring):
# Create a brand new NavigableString from this string.
child = self.soup.new_string(node)
# Tell Beautiful Soup to act as if it parsed this element
# immediately after the parent's last descendant. (Or
# immediately after the parent, if it has no children.)
if self.element.contents:
most_recent_element = self.element._last_descendant(False)
else:
most_recent_element = self.element
self.soup.object_was_parsed(
child, parent=self.element,
most_recent_element=most_recent_element)
def getAttributes(self):
return AttrList(self.element)
@@ -162,11 +195,11 @@ class Element(html5lib.treebuilders._base.Node):
attributes = property(getAttributes, setAttributes)
def insertText(self, data, insertBefore=None):
text = TextNode(self.soup.new_string(data), self.soup)
if insertBefore:
self.insertBefore(text, insertBefore)
text = TextNode(self.soup.new_string(data), self.soup)
self.insertBefore(data, insertBefore)
else:
self.appendChild(text)
self.appendChild(data)
def insertBefore(self, node, refNode):
index = self.element.index(refNode.element)
@@ -183,16 +216,46 @@ class Element(html5lib.treebuilders._base.Node):
def removeChild(self, node):
node.element.extract()
def reparentChildren(self, newParent):
while self.element.contents:
child = self.element.contents[0]
child.extract()
if isinstance(child, Tag):
newParent.appendChild(
Element(child, self.soup, namespaces["html"]))
else:
newParent.appendChild(
TextNode(child, self.soup))
def reparentChildren(self, new_parent):
"""Move all of this tag's children into another tag."""
element = self.element
new_parent_element = new_parent.element
# Determine what this tag's next_element will be once all the children
# are removed.
final_next_element = element.next_sibling
new_parents_last_descendant = new_parent_element._last_descendant(False, False)
if len(new_parent_element.contents) > 0:
# The new parent already contains children. We will be
# appending this tag's children to the end.
new_parents_last_child = new_parent_element.contents[-1]
new_parents_last_descendant_next_element = new_parents_last_descendant.next_element
else:
# The new parent contains no children.
new_parents_last_child = None
new_parents_last_descendant_next_element = new_parent_element.next_element
to_append = element.contents
append_after = new_parent.element.contents
if len(to_append) > 0:
# Set the first child's previous_element and previous_sibling
# to elements within the new parent
first_child = to_append[0]
first_child.previous_element = new_parents_last_descendant
first_child.previous_sibling = new_parents_last_child
# Fix the last child's next_element and next_sibling
last_child = to_append[-1]
last_child.next_element = new_parents_last_descendant_next_element
last_child.next_sibling = None
for child in to_append:
child.parent = new_parent_element
new_parent_element.contents.append(child)
# Now that this element has no children, change its .next_element.
element.contents = []
element.next_element = final_next_element
def cloneNode(self):
tag = self.soup.new_tag(self.element.name, self.namespace)

View File

@@ -45,7 +45,15 @@ HTMLPARSER = 'html.parser'
class BeautifulSoupHTMLParser(HTMLParser):
def handle_starttag(self, name, attrs):
# XXX namespace
self.soup.handle_starttag(name, None, None, dict(attrs))
attr_dict = {}
for key, value in attrs:
# Change None attribute values to the empty string
# for consistency with the other tree builders.
if value is None:
value = ''
attr_dict[key] = value
attrvalue = '""'
self.soup.handle_starttag(name, None, None, attr_dict)
def handle_endtag(self, name):
self.soup.handle_endtag(name)
@@ -58,6 +66,8 @@ class BeautifulSoupHTMLParser(HTMLParser):
# it's fixed.
if name.startswith('x'):
real_name = int(name.lstrip('x'), 16)
elif name.startswith('X'):
real_name = int(name.lstrip('X'), 16)
else:
real_name = int(name)
@@ -85,6 +95,9 @@ class BeautifulSoupHTMLParser(HTMLParser):
self.soup.endData()
if data.startswith("DOCTYPE "):
data = data[len("DOCTYPE "):]
elif data == 'DOCTYPE':
# i.e. "<!DOCTYPE>"
data = ''
self.soup.handle_data(data)
self.soup.endData(Doctype)
@@ -130,13 +143,14 @@ class HTMLParserTreeBuilder(HTMLTreeBuilder):
replaced with REPLACEMENT CHARACTER).
"""
if isinstance(markup, unicode):
return markup, None, None, False
yield (markup, None, None, False)
return
try_encodings = [user_specified_encoding, document_declared_encoding]
dammit = UnicodeDammit(markup, try_encodings, is_html=True)
return (dammit.markup, dammit.original_encoding,
dammit.declared_html_encoding,
dammit.contains_replacement_characters)
yield (dammit.markup, dammit.original_encoding,
dammit.declared_html_encoding,
dammit.contains_replacement_characters)
def feed(self, markup):
args, kwargs = self.parser_args

View File

@@ -3,6 +3,7 @@ __all__ = [
'LXMLTreeBuilder',
]
from io import BytesIO
from StringIO import StringIO
import collections
from lxml import etree
@@ -12,9 +13,10 @@ from bs4.builder import (
HTML,
HTMLTreeBuilder,
PERMISSIVE,
ParserRejectedMarkup,
TreeBuilder,
XML)
from bs4.dammit import UnicodeDammit
from bs4.dammit import EncodingDetector
LXML = 'lxml'
@@ -28,24 +30,36 @@ class LXMLTreeBuilderForXML(TreeBuilder):
CHUNK_SIZE = 512
@property
def default_parser(self):
# This namespace mapping is specified in the XML Namespace
# standard.
DEFAULT_NSMAPS = {'http://www.w3.org/XML/1998/namespace' : "xml"}
def default_parser(self, encoding):
# This can either return a parser object or a class, which
# will be instantiated with default arguments.
return etree.XMLParser(target=self, strip_cdata=False, recover=True)
if self._default_parser is not None:
return self._default_parser
return etree.XMLParser(
target=self, strip_cdata=False, recover=True, encoding=encoding)
def parser_for(self, encoding):
# Use the default parser.
parser = self.default_parser(encoding)
def __init__(self, parser=None, empty_element_tags=None):
if empty_element_tags is not None:
self.empty_element_tags = set(empty_element_tags)
if parser is None:
# Use the default parser.
parser = self.default_parser
if isinstance(parser, collections.Callable):
# Instantiate the parser with default arguments
parser = parser(target=self, strip_cdata=False)
self.parser = parser
parser = parser(target=self, strip_cdata=False, encoding=encoding)
return parser
def __init__(self, parser=None, empty_element_tags=None):
# TODO: Issue a warning if parser is present but not a
# callable, since that means there's no way to create new
# parsers for different encodings.
self._default_parser = parser
if empty_element_tags is not None:
self.empty_element_tags = set(empty_element_tags)
self.soup = None
self.nsmaps = None
self.nsmaps = [self.DEFAULT_NSMAPS]
def _getNsTag(self, tag):
# Split the namespace URL out of a fully-qualified lxml tag
@@ -58,50 +72,69 @@ class LXMLTreeBuilderForXML(TreeBuilder):
def prepare_markup(self, markup, user_specified_encoding=None,
document_declared_encoding=None):
"""
:return: A 3-tuple (markup, original encoding, encoding
declared within markup).
:yield: A series of 4-tuples.
(markup, encoding, declared encoding,
has undergone character replacement)
Each 4-tuple represents a strategy for parsing the document.
"""
if isinstance(markup, unicode):
return markup, None, None, False
# We were given Unicode. Maybe lxml can parse Unicode on
# this system?
yield markup, None, document_declared_encoding, False
if isinstance(markup, unicode):
# No, apparently not. Convert the Unicode to UTF-8 and
# tell lxml to parse it as UTF-8.
yield (markup.encode("utf8"), "utf8",
document_declared_encoding, False)
# Instead of using UnicodeDammit to convert the bytestring to
# Unicode using different encodings, use EncodingDetector to
# iterate over the encodings, and tell lxml to try to parse
# the document as each one in turn.
is_html = not self.is_xml
try_encodings = [user_specified_encoding, document_declared_encoding]
dammit = UnicodeDammit(markup, try_encodings, is_html=True)
return (dammit.markup, dammit.original_encoding,
dammit.declared_html_encoding,
dammit.contains_replacement_characters)
detector = EncodingDetector(markup, try_encodings, is_html)
for encoding in detector.encodings:
yield (detector.markup, encoding, document_declared_encoding, False)
def feed(self, markup):
if isinstance(markup, basestring):
if isinstance(markup, bytes):
markup = BytesIO(markup)
elif isinstance(markup, unicode):
markup = StringIO(markup)
# Call feed() at least once, even if the markup is empty,
# or the parser won't be initialized.
data = markup.read(self.CHUNK_SIZE)
self.parser.feed(data)
while data != '':
# Now call feed() on the rest of the data, chunk by chunk.
data = markup.read(self.CHUNK_SIZE)
if data != '':
self.parser.feed(data)
self.parser.close()
try:
self.parser = self.parser_for(self.soup.original_encoding)
self.parser.feed(data)
while len(data) != 0:
# Now call feed() on the rest of the data, chunk by chunk.
data = markup.read(self.CHUNK_SIZE)
if len(data) != 0:
self.parser.feed(data)
self.parser.close()
except (UnicodeDecodeError, LookupError, etree.ParserError), e:
raise ParserRejectedMarkup(str(e))
def close(self):
self.nsmaps = None
self.nsmaps = [self.DEFAULT_NSMAPS]
def start(self, name, attrs, nsmap={}):
# Make sure attrs is a mutable dict--lxml may send an immutable dictproxy.
attrs = dict(attrs)
nsprefix = None
# Invert each namespace map as it comes in.
if len(nsmap) == 0 and self.nsmaps != None:
# There are no new namespaces for this tag, but namespaces
# are in play, so we need a separate tag stack to know
# when they end.
if len(self.nsmaps) > 1:
# There are no new namespaces for this tag, but
# non-default namespaces are in play, so we need a
# separate tag stack to know when they end.
self.nsmaps.append(None)
elif len(nsmap) > 0:
# A new namespace mapping has come into play.
if self.nsmaps is None:
self.nsmaps = []
inverted_nsmap = dict((value, key) for key, value in nsmap.items())
self.nsmaps.append(inverted_nsmap)
# Also treat the namespace mapping as a set of attributes on the
@@ -111,14 +144,34 @@ class LXMLTreeBuilderForXML(TreeBuilder):
attribute = NamespacedAttribute(
"xmlns", prefix, "http://www.w3.org/2000/xmlns/")
attrs[attribute] = namespace
# Namespaces are in play. Find any attributes that came in
# from lxml with namespaces attached to their names, and
# turn then into NamespacedAttribute objects.
new_attrs = {}
for attr, value in attrs.items():
namespace, attr = self._getNsTag(attr)
if namespace is None:
new_attrs[attr] = value
else:
nsprefix = self._prefix_for_namespace(namespace)
attr = NamespacedAttribute(nsprefix, attr, namespace)
new_attrs[attr] = value
attrs = new_attrs
namespace, name = self._getNsTag(name)
if namespace is not None:
for inverted_nsmap in reversed(self.nsmaps):
if inverted_nsmap is not None and namespace in inverted_nsmap:
nsprefix = inverted_nsmap[namespace]
break
nsprefix = self._prefix_for_namespace(namespace)
self.soup.handle_starttag(name, namespace, nsprefix, attrs)
def _prefix_for_namespace(self, namespace):
"""Find the currently active prefix for the given namespace."""
if namespace is None:
return None
for inverted_nsmap in reversed(self.nsmaps):
if inverted_nsmap is not None and namespace in inverted_nsmap:
return inverted_nsmap[namespace]
return None
def end(self, name):
self.soup.endData()
completed_tag = self.soup.tagStack[-1]
@@ -130,14 +183,10 @@ class LXMLTreeBuilderForXML(TreeBuilder):
nsprefix = inverted_nsmap[namespace]
break
self.soup.handle_endtag(name, nsprefix)
if self.nsmaps != None:
if len(self.nsmaps) > 1:
# This tag, or one of its parents, introduced a namespace
# mapping, so pop it off the stack.
self.nsmaps.pop()
if len(self.nsmaps) == 0:
# Namespaces are no longer in play, so don't bother keeping
# track of the namespace stack.
self.nsmaps = None
def pi(self, target, data):
pass
@@ -166,13 +215,18 @@ class LXMLTreeBuilder(HTMLTreeBuilder, LXMLTreeBuilderForXML):
features = [LXML, HTML, FAST, PERMISSIVE]
is_xml = False
@property
def default_parser(self):
def default_parser(self, encoding):
return etree.HTMLParser
def feed(self, markup):
self.parser.feed(markup)
self.parser.close()
encoding = self.soup.original_encoding
try:
self.parser = self.parser_for(encoding)
self.parser.feed(markup)
self.parser.close()
except (UnicodeDecodeError, LookupError, etree.ParserError), e:
raise ParserRejectedMarkup(str(e))
def test_fragment_to_document(self, fragment):
"""See `TreeBuilder`."""

View File

@@ -1,27 +1,40 @@
# -*- coding: utf-8 -*-
"""Beautiful Soup bonus library: Unicode, Dammit
This class forces XML data into a standard format (usually to UTF-8 or
Unicode). It is heavily based on code from Mark Pilgrim's Universal
Feed Parser. It does not rewrite the XML or HTML to reflect a new
encoding; that's the tree builder's job.
This library converts a bytestream to Unicode through any means
necessary. It is heavily based on code from Mark Pilgrim's Universal
Feed Parser. It works best on XML and XML, but it does not rewrite the
XML or HTML to reflect a new encoding; that's the tree builder's job.
"""
import codecs
from htmlentitydefs import codepoint2name
import re
import warnings
import logging
import string
# Autodetects character encodings. Very useful.
# Download from http://chardet.feedparser.org/
# or 'apt-get install python-chardet'
# or 'easy_install chardet'
# Import a library to autodetect character encodings.
chardet_type = None
try:
import chardet
#import chardet.constants
#chardet.constants._debug = 1
# First try the fast C implementation.
# PyPI package: cchardet
import cchardet
def chardet_dammit(s):
return cchardet.detect(s)['encoding']
except ImportError:
chardet = None
try:
# Fall back to the pure Python implementation
# Debian package: python-chardet
# PyPI package: chardet
import chardet
def chardet_dammit(s):
return chardet.detect(s)['encoding']
#import chardet.constants
#chardet.constants._debug = 1
except ImportError:
# No chardet available.
def chardet_dammit(s):
return None
# Available from http://cjkpython.i18n.org/.
try:
@@ -69,6 +82,8 @@ class EntitySubstitution(object):
"&(?!#\d+;|#x[0-9a-fA-F]+;|\w+;)"
")")
AMPERSAND_OR_BRACKET = re.compile("([<>&])")
@classmethod
def _substitute_html_entity(cls, matchobj):
entity = cls.CHARACTER_TO_HTML_ENTITY.get(matchobj.group(0))
@@ -122,6 +137,28 @@ class EntitySubstitution(object):
def substitute_xml(cls, value, make_quoted_attribute=False):
"""Substitute XML entities for special XML characters.
:param value: A string to be substituted. The less-than sign
will become &lt;, the greater-than sign will become &gt;,
and any ampersands will become &amp;. If you want ampersands
that appear to be part of an entity definition to be left
alone, use substitute_xml_containing_entities() instead.
:param make_quoted_attribute: If True, then the string will be
quoted, as befits an attribute value.
"""
# Escape angle brackets and ampersands.
value = cls.AMPERSAND_OR_BRACKET.sub(
cls._substitute_xml_entity, value)
if make_quoted_attribute:
value = cls.quoted_attribute_value(value)
return value
@classmethod
def substitute_xml_containing_entities(
cls, value, make_quoted_attribute=False):
"""Substitute XML entities for special XML characters.
:param value: A string to be substituted. The less-than sign will
become &lt;, the greater-than sign will become &gt;, and any
ampersands that are not part of an entity defition will
@@ -155,6 +192,125 @@ class EntitySubstitution(object):
cls._substitute_html_entity, s)
class EncodingDetector:
"""Suggests a number of possible encodings for a bytestring.
Order of precedence:
1. Encodings you specifically tell EncodingDetector to try first
(the override_encodings argument to the constructor).
2. An encoding declared within the bytestring itself, either in an
XML declaration (if the bytestring is to be interpreted as an XML
document), or in a <meta> tag (if the bytestring is to be
interpreted as an HTML document.)
3. An encoding detected through textual analysis by chardet,
cchardet, or a similar external library.
4. UTF-8.
5. Windows-1252.
"""
def __init__(self, markup, override_encodings=None, is_html=False):
self.override_encodings = override_encodings or []
self.chardet_encoding = None
self.is_html = is_html
self.declared_encoding = None
# First order of business: strip a byte-order mark.
self.markup, self.sniffed_encoding = self.strip_byte_order_mark(markup)
def _usable(self, encoding, tried):
if encoding is not None:
encoding = encoding.lower()
if encoding not in tried:
tried.add(encoding)
return True
return False
@property
def encodings(self):
"""Yield a number of encodings that might work for this markup."""
tried = set()
for e in self.override_encodings:
if self._usable(e, tried):
yield e
# Did the document originally start with a byte-order mark
# that indicated its encoding?
if self._usable(self.sniffed_encoding, tried):
yield self.sniffed_encoding
# Look within the document for an XML or HTML encoding
# declaration.
if self.declared_encoding is None:
self.declared_encoding = self.find_declared_encoding(
self.markup, self.is_html)
if self._usable(self.declared_encoding, tried):
yield self.declared_encoding
# Use third-party character set detection to guess at the
# encoding.
if self.chardet_encoding is None:
self.chardet_encoding = chardet_dammit(self.markup)
if self._usable(self.chardet_encoding, tried):
yield self.chardet_encoding
# As a last-ditch effort, try utf-8 and windows-1252.
for e in ('utf-8', 'windows-1252'):
if self._usable(e, tried):
yield e
@classmethod
def strip_byte_order_mark(cls, data):
"""If a byte-order mark is present, strip it and return the encoding it implies."""
encoding = None
if (len(data) >= 4) and (data[:2] == b'\xfe\xff') \
and (data[2:4] != '\x00\x00'):
encoding = 'utf-16be'
data = data[2:]
elif (len(data) >= 4) and (data[:2] == b'\xff\xfe') \
and (data[2:4] != '\x00\x00'):
encoding = 'utf-16le'
data = data[2:]
elif data[:3] == b'\xef\xbb\xbf':
encoding = 'utf-8'
data = data[3:]
elif data[:4] == b'\x00\x00\xfe\xff':
encoding = 'utf-32be'
data = data[4:]
elif data[:4] == b'\xff\xfe\x00\x00':
encoding = 'utf-32le'
data = data[4:]
return data, encoding
@classmethod
def find_declared_encoding(cls, markup, is_html=False, search_entire_document=False):
"""Given a document, tries to find its declared encoding.
An XML encoding is declared at the beginning of the document.
An HTML encoding is declared in a <meta> tag, hopefully near the
beginning of the document.
"""
if search_entire_document:
xml_endpos = html_endpos = len(markup)
else:
xml_endpos = 1024
html_endpos = max(2048, int(len(markup) * 0.05))
declared_encoding = None
declared_encoding_match = xml_encoding_re.search(markup, endpos=xml_endpos)
if not declared_encoding_match and is_html:
declared_encoding_match = html_meta_re.search(markup, endpos=html_endpos)
if declared_encoding_match is not None:
declared_encoding = declared_encoding_match.groups()[0].decode(
'ascii')
if declared_encoding:
return declared_encoding.lower()
return None
class UnicodeDammit:
"""A class for detecting the encoding of a *ML document and
converting it to a Unicode string. If the source encoding is
@@ -176,65 +332,48 @@ class UnicodeDammit:
def __init__(self, markup, override_encodings=[],
smart_quotes_to=None, is_html=False):
self.declared_html_encoding = None
self.smart_quotes_to = smart_quotes_to
self.tried_encodings = []
self.contains_replacement_characters = False
self.is_html = is_html
if markup == '' or isinstance(markup, unicode):
self.detector = EncodingDetector(markup, override_encodings, is_html)
# Short-circuit if the data is in Unicode to begin with.
if isinstance(markup, unicode) or markup == '':
self.markup = markup
self.unicode_markup = unicode(markup)
self.original_encoding = None
return
new_markup, document_encoding, sniffed_encoding = \
self._detectEncoding(markup, is_html)
self.markup = new_markup
# The encoding detector may have stripped a byte-order mark.
# Use the stripped markup from this point on.
self.markup = self.detector.markup
u = None
if new_markup != markup:
# _detectEncoding modified the markup, then converted it to
# Unicode and then to UTF-8. So convert it from UTF-8.
u = self._convert_from("utf8")
self.original_encoding = sniffed_encoding
for encoding in self.detector.encodings:
markup = self.detector.markup
u = self._convert_from(encoding)
if u is not None:
break
if not u:
for proposed_encoding in (
override_encodings + [document_encoding, sniffed_encoding]):
if proposed_encoding is not None:
u = self._convert_from(proposed_encoding)
if u:
break
# None of the encodings worked. As an absolute last resort,
# try them again with character replacement.
# If no luck and we have auto-detection library, try that:
if not u and chardet and not isinstance(self.markup, unicode):
u = self._convert_from(chardet.detect(self.markup)['encoding'])
# As a last resort, try utf-8 and windows-1252:
if not u:
for proposed_encoding in ("utf-8", "windows-1252"):
u = self._convert_from(proposed_encoding)
if u:
break
# As an absolute last resort, try the encodings again with
# character replacement.
if not u:
for proposed_encoding in (
override_encodings + [
document_encoding, sniffed_encoding, "utf-8", "windows-1252"]):
if proposed_encoding != "ascii":
u = self._convert_from(proposed_encoding, "replace")
for encoding in self.detector.encodings:
if encoding != "ascii":
u = self._convert_from(encoding, "replace")
if u is not None:
warnings.warn(
UnicodeWarning(
logging.warning(
"Some characters could not be decoded, and were "
"replaced with REPLACEMENT CHARACTER."))
"replaced with REPLACEMENT CHARACTER.")
self.contains_replacement_characters = True
break
# We could at this point force it to ASCII, but that would
# destroy so much data that I think giving up is better
# If none of that worked, we could at this point force it to
# ASCII, but that would destroy so much data that I think
# giving up is better.
self.unicode_markup = u
if not u:
self.original_encoding = None
@@ -262,11 +401,10 @@ class UnicodeDammit:
return None
self.tried_encodings.append((proposed, errors))
markup = self.markup
# Convert smart quotes to HTML if coming from an encoding
# that might have them.
if (self.smart_quotes_to is not None
and proposed.lower() in self.ENCODINGS_WITH_SMART_QUOTES):
and proposed in self.ENCODINGS_WITH_SMART_QUOTES):
smart_quotes_re = b"([\x80-\x9f])"
smart_quotes_compiled = re.compile(smart_quotes_re)
markup = smart_quotes_compiled.sub(self._sub_ms_char, markup)
@@ -287,99 +425,24 @@ class UnicodeDammit:
def _to_unicode(self, data, encoding, errors="strict"):
'''Given a string and its encoding, decodes the string into Unicode.
%encoding is a string recognized by encodings.aliases'''
return unicode(data, encoding, errors)
# strip Byte Order Mark (if present)
if (len(data) >= 4) and (data[:2] == '\xfe\xff') \
and (data[2:4] != '\x00\x00'):
encoding = 'utf-16be'
data = data[2:]
elif (len(data) >= 4) and (data[:2] == '\xff\xfe') \
and (data[2:4] != '\x00\x00'):
encoding = 'utf-16le'
data = data[2:]
elif data[:3] == '\xef\xbb\xbf':
encoding = 'utf-8'
data = data[3:]
elif data[:4] == '\x00\x00\xfe\xff':
encoding = 'utf-32be'
data = data[4:]
elif data[:4] == '\xff\xfe\x00\x00':
encoding = 'utf-32le'
data = data[4:]
newdata = unicode(data, encoding, errors)
return newdata
def _detectEncoding(self, xml_data, is_html=False):
"""Given a document, tries to detect its XML encoding."""
xml_encoding = sniffed_xml_encoding = None
try:
if xml_data[:4] == b'\x4c\x6f\xa7\x94':
# EBCDIC
xml_data = self._ebcdic_to_ascii(xml_data)
elif xml_data[:4] == b'\x00\x3c\x00\x3f':
# UTF-16BE
sniffed_xml_encoding = 'utf-16be'
xml_data = unicode(xml_data, 'utf-16be').encode('utf-8')
elif (len(xml_data) >= 4) and (xml_data[:2] == b'\xfe\xff') \
and (xml_data[2:4] != b'\x00\x00'):
# UTF-16BE with BOM
sniffed_xml_encoding = 'utf-16be'
xml_data = unicode(xml_data[2:], 'utf-16be').encode('utf-8')
elif xml_data[:4] == b'\x3c\x00\x3f\x00':
# UTF-16LE
sniffed_xml_encoding = 'utf-16le'
xml_data = unicode(xml_data, 'utf-16le').encode('utf-8')
elif (len(xml_data) >= 4) and (xml_data[:2] == b'\xff\xfe') and \
(xml_data[2:4] != b'\x00\x00'):
# UTF-16LE with BOM
sniffed_xml_encoding = 'utf-16le'
xml_data = unicode(xml_data[2:], 'utf-16le').encode('utf-8')
elif xml_data[:4] == b'\x00\x00\x00\x3c':
# UTF-32BE
sniffed_xml_encoding = 'utf-32be'
xml_data = unicode(xml_data, 'utf-32be').encode('utf-8')
elif xml_data[:4] == b'\x3c\x00\x00\x00':
# UTF-32LE
sniffed_xml_encoding = 'utf-32le'
xml_data = unicode(xml_data, 'utf-32le').encode('utf-8')
elif xml_data[:4] == b'\x00\x00\xfe\xff':
# UTF-32BE with BOM
sniffed_xml_encoding = 'utf-32be'
xml_data = unicode(xml_data[4:], 'utf-32be').encode('utf-8')
elif xml_data[:4] == b'\xff\xfe\x00\x00':
# UTF-32LE with BOM
sniffed_xml_encoding = 'utf-32le'
xml_data = unicode(xml_data[4:], 'utf-32le').encode('utf-8')
elif xml_data[:3] == b'\xef\xbb\xbf':
# UTF-8 with BOM
sniffed_xml_encoding = 'utf-8'
xml_data = unicode(xml_data[3:], 'utf-8').encode('utf-8')
else:
sniffed_xml_encoding = 'ascii'
pass
except:
xml_encoding_match = None
xml_encoding_match = xml_encoding_re.match(xml_data)
if not xml_encoding_match and is_html:
xml_encoding_match = html_meta_re.search(xml_data)
if xml_encoding_match is not None:
xml_encoding = xml_encoding_match.groups()[0].decode(
'ascii').lower()
if is_html:
self.declared_html_encoding = xml_encoding
if sniffed_xml_encoding and \
(xml_encoding in ('iso-10646-ucs-2', 'ucs-2', 'csunicode',
'iso-10646-ucs-4', 'ucs-4', 'csucs4',
'utf-16', 'utf-32', 'utf_16', 'utf_32',
'utf16', 'u16')):
xml_encoding = sniffed_xml_encoding
return xml_data, xml_encoding, sniffed_xml_encoding
@property
def declared_html_encoding(self):
if not self.is_html:
return None
return self.detector.declared_encoding
def find_codec(self, charset):
return self._codec(self.CHARSET_ALIASES.get(charset, charset)) \
or (charset and self._codec(charset.replace("-", ""))) \
or (charset and self._codec(charset.replace("-", "_"))) \
value = (self._codec(self.CHARSET_ALIASES.get(charset, charset))
or (charset and self._codec(charset.replace("-", "")))
or (charset and self._codec(charset.replace("-", "_")))
or (charset and charset.lower())
or charset
)
if value:
return value.lower()
return None
def _codec(self, charset):
if not charset:
@@ -392,32 +455,6 @@ class UnicodeDammit:
pass
return codec
EBCDIC_TO_ASCII_MAP = None
def _ebcdic_to_ascii(self, s):
c = self.__class__
if not c.EBCDIC_TO_ASCII_MAP:
emap = (0,1,2,3,156,9,134,127,151,141,142,11,12,13,14,15,
16,17,18,19,157,133,8,135,24,25,146,143,28,29,30,31,
128,129,130,131,132,10,23,27,136,137,138,139,140,5,6,7,
144,145,22,147,148,149,150,4,152,153,154,155,20,21,158,26,
32,160,161,162,163,164,165,166,167,168,91,46,60,40,43,33,
38,169,170,171,172,173,174,175,176,177,93,36,42,41,59,94,
45,47,178,179,180,181,182,183,184,185,124,44,37,95,62,63,
186,187,188,189,190,191,192,193,194,96,58,35,64,39,61,34,
195,97,98,99,100,101,102,103,104,105,196,197,198,199,200,
201,202,106,107,108,109,110,111,112,113,114,203,204,205,
206,207,208,209,126,115,116,117,118,119,120,121,122,210,
211,212,213,214,215,216,217,218,219,220,221,222,223,224,
225,226,227,228,229,230,231,123,65,66,67,68,69,70,71,72,
73,232,233,234,235,236,237,125,74,75,76,77,78,79,80,81,
82,238,239,240,241,242,243,92,159,83,84,85,86,87,88,89,
90,244,245,246,247,248,249,48,49,50,51,52,53,54,55,56,57,
250,251,252,253,254,255)
import string
c.EBCDIC_TO_ASCII_MAP = string.maketrans(
''.join(map(chr, list(range(256)))), ''.join(map(chr, emap)))
return s.translate(c.EBCDIC_TO_ASCII_MAP)
# A partial mapping of ISO-Latin-1 to HTML entities/XML numeric entities.
MS_CHARS = {b'\x80': ('euro', '20AC'),

204
libs/bs4/diagnose.py Normal file
View File

@@ -0,0 +1,204 @@
"""Diagnostic functions, mainly for use when doing tech support."""
import cProfile
from StringIO import StringIO
from HTMLParser import HTMLParser
import bs4
from bs4 import BeautifulSoup, __version__
from bs4.builder import builder_registry
import os
import pstats
import random
import tempfile
import time
import traceback
import sys
import cProfile
def diagnose(data):
"""Diagnostic suite for isolating common problems."""
print "Diagnostic running on Beautiful Soup %s" % __version__
print "Python version %s" % sys.version
basic_parsers = ["html.parser", "html5lib", "lxml"]
for name in basic_parsers:
for builder in builder_registry.builders:
if name in builder.features:
break
else:
basic_parsers.remove(name)
print (
"I noticed that %s is not installed. Installing it may help." %
name)
if 'lxml' in basic_parsers:
basic_parsers.append(["lxml", "xml"])
from lxml import etree
print "Found lxml version %s" % ".".join(map(str,etree.LXML_VERSION))
if 'html5lib' in basic_parsers:
import html5lib
print "Found html5lib version %s" % html5lib.__version__
if hasattr(data, 'read'):
data = data.read()
elif os.path.exists(data):
print '"%s" looks like a filename. Reading data from the file.' % data
data = open(data).read()
elif data.startswith("http:") or data.startswith("https:"):
print '"%s" looks like a URL. Beautiful Soup is not an HTTP client.' % data
print "You need to use some other library to get the document behind the URL, and feed that document to Beautiful Soup."
return
print
for parser in basic_parsers:
print "Trying to parse your markup with %s" % parser
success = False
try:
soup = BeautifulSoup(data, parser)
success = True
except Exception, e:
print "%s could not parse the markup." % parser
traceback.print_exc()
if success:
print "Here's what %s did with the markup:" % parser
print soup.prettify()
print "-" * 80
def lxml_trace(data, html=True, **kwargs):
"""Print out the lxml events that occur during parsing.
This lets you see how lxml parses a document when no Beautiful
Soup code is running.
"""
from lxml import etree
for event, element in etree.iterparse(StringIO(data), html=html, **kwargs):
print("%s, %4s, %s" % (event, element.tag, element.text))
class AnnouncingParser(HTMLParser):
"""Announces HTMLParser parse events, without doing anything else."""
def _p(self, s):
print(s)
def handle_starttag(self, name, attrs):
self._p("%s START" % name)
def handle_endtag(self, name):
self._p("%s END" % name)
def handle_data(self, data):
self._p("%s DATA" % data)
def handle_charref(self, name):
self._p("%s CHARREF" % name)
def handle_entityref(self, name):
self._p("%s ENTITYREF" % name)
def handle_comment(self, data):
self._p("%s COMMENT" % data)
def handle_decl(self, data):
self._p("%s DECL" % data)
def unknown_decl(self, data):
self._p("%s UNKNOWN-DECL" % data)
def handle_pi(self, data):
self._p("%s PI" % data)
def htmlparser_trace(data):
"""Print out the HTMLParser events that occur during parsing.
This lets you see how HTMLParser parses a document when no
Beautiful Soup code is running.
"""
parser = AnnouncingParser()
parser.feed(data)
_vowels = "aeiou"
_consonants = "bcdfghjklmnpqrstvwxyz"
def rword(length=5):
"Generate a random word-like string."
s = ''
for i in range(length):
if i % 2 == 0:
t = _consonants
else:
t = _vowels
s += random.choice(t)
return s
def rsentence(length=4):
"Generate a random sentence-like string."
return " ".join(rword(random.randint(4,9)) for i in range(length))
def rdoc(num_elements=1000):
"""Randomly generate an invalid HTML document."""
tag_names = ['p', 'div', 'span', 'i', 'b', 'script', 'table']
elements = []
for i in range(num_elements):
choice = random.randint(0,3)
if choice == 0:
# New tag.
tag_name = random.choice(tag_names)
elements.append("<%s>" % tag_name)
elif choice == 1:
elements.append(rsentence(random.randint(1,4)))
elif choice == 2:
# Close a tag.
tag_name = random.choice(tag_names)
elements.append("</%s>" % tag_name)
return "<html>" + "\n".join(elements) + "</html>"
def benchmark_parsers(num_elements=100000):
"""Very basic head-to-head performance benchmark."""
print "Comparative parser benchmark on Beautiful Soup %s" % __version__
data = rdoc(num_elements)
print "Generated a large invalid HTML document (%d bytes)." % len(data)
for parser in ["lxml", ["lxml", "html"], "html5lib", "html.parser"]:
success = False
try:
a = time.time()
soup = BeautifulSoup(data, parser)
b = time.time()
success = True
except Exception, e:
print "%s could not parse the markup." % parser
traceback.print_exc()
if success:
print "BS4+%s parsed the markup in %.2fs." % (parser, b-a)
from lxml import etree
a = time.time()
etree.HTML(data)
b = time.time()
print "Raw lxml parsed the markup in %.2fs." % (b-a)
import html5lib
parser = html5lib.HTMLParser()
a = time.time()
parser.parse(data)
b = time.time()
print "Raw html5lib parsed the markup in %.2fs." % (b-a)
def profile(num_elements=100000, parser="lxml"):
filehandle = tempfile.NamedTemporaryFile()
filename = filehandle.name
data = rdoc(num_elements)
vars = dict(bs4=bs4, data=data, parser=parser)
cProfile.runctx('bs4.BeautifulSoup(data, parser)' , vars, vars, filename)
stats = pstats.Stats(filename)
# stats.strip_dirs()
stats.sort_stats("cumulative")
stats.print_stats('_html5lib|bs4', 50)
if __name__ == '__main__':
diagnose(sys.stdin.read())

View File

@@ -26,6 +26,9 @@ class NamespacedAttribute(unicode):
def __new__(cls, prefix, name, namespace=None):
if name is None:
obj = unicode.__new__(cls, prefix)
elif prefix is None:
# Not really namespaced.
obj = unicode.__new__(cls, name)
else:
obj = unicode.__new__(cls, prefix + ":" + name)
obj.prefix = prefix
@@ -78,6 +81,40 @@ class ContentMetaAttributeValue(AttributeValueWithCharsetSubstitution):
return match.group(1) + encoding
return self.CHARSET_RE.sub(rewrite, self.original_value)
class HTMLAwareEntitySubstitution(EntitySubstitution):
"""Entity substitution rules that are aware of some HTML quirks.
Specifically, the contents of <script> and <style> tags should not
undergo entity substitution.
Incoming NavigableString objects are checked to see if they're the
direct children of a <script> or <style> tag.
"""
cdata_containing_tags = set(["script", "style"])
preformatted_tags = set(["pre"])
@classmethod
def _substitute_if_appropriate(cls, ns, f):
if (isinstance(ns, NavigableString)
and ns.parent is not None
and ns.parent.name in cls.cdata_containing_tags):
# Do nothing.
return ns
# Substitute.
return f(ns)
@classmethod
def substitute_html(cls, ns):
return cls._substitute_if_appropriate(
ns, EntitySubstitution.substitute_html)
@classmethod
def substitute_xml(cls, ns):
return cls._substitute_if_appropriate(
ns, EntitySubstitution.substitute_xml)
class PageElement(object):
"""Contains the navigational information for some part of the page
@@ -94,25 +131,60 @@ class PageElement(object):
# converted to entities. This is not recommended, but it's
# faster than "minimal".
# A function - This function will be called on every string that
# needs to undergo entity substition
FORMATTERS = {
# needs to undergo entity substitution.
#
# In an HTML document, the default "html" and "minimal" functions
# will leave the contents of <script> and <style> tags alone. For
# an XML document, all tags will be given the same treatment.
HTML_FORMATTERS = {
"html" : HTMLAwareEntitySubstitution.substitute_html,
"minimal" : HTMLAwareEntitySubstitution.substitute_xml,
None : None
}
XML_FORMATTERS = {
"html" : EntitySubstitution.substitute_html,
"minimal" : EntitySubstitution.substitute_xml,
None : None
}
@classmethod
def format_string(self, s, formatter='minimal'):
"""Format the given string using the given formatter."""
if not callable(formatter):
formatter = self.FORMATTERS.get(
formatter, EntitySubstitution.substitute_xml)
formatter = self._formatter_for_name(formatter)
if formatter is None:
output = s
else:
output = formatter(s)
return output
@property
def _is_xml(self):
"""Is this element part of an XML tree or an HTML tree?
This is used when mapping a formatter name ("minimal") to an
appropriate function (one that performs entity-substitution on
the contents of <script> and <style> tags, or not). It's
inefficient, but it should be called very rarely.
"""
if self.parent is None:
# This is the top-level object. It should have .is_xml set
# from tree creation. If not, take a guess--BS is usually
# used on HTML markup.
return getattr(self, 'is_xml', False)
return self.parent._is_xml
def _formatter_for_name(self, name):
"Look up a formatter function based on its name and the tree."
if self._is_xml:
return self.XML_FORMATTERS.get(
name, EntitySubstitution.substitute_xml)
else:
return self.HTML_FORMATTERS.get(
name, HTMLAwareEntitySubstitution.substitute_xml)
def setup(self, parent=None, previous_element=None):
"""Sets up the initial relations between this element and
other elements."""
@@ -183,11 +255,16 @@ class PageElement(object):
self.previous_sibling = self.next_sibling = None
return self
def _last_descendant(self):
def _last_descendant(self, is_initialized=True, accept_self=True):
"Finds the last element beneath this object to be parsed."
last_child = self
while hasattr(last_child, 'contents') and last_child.contents:
last_child = last_child.contents[-1]
if is_initialized and self.next_sibling:
last_child = self.next_sibling.previous_element
else:
last_child = self
while isinstance(last_child, Tag) and last_child.contents:
last_child = last_child.contents[-1]
if not accept_self and last_child == self:
last_child = None
return last_child
# BS3: Not part of the API!
_lastRecursiveChild = _last_descendant
@@ -222,11 +299,11 @@ class PageElement(object):
previous_child = self.contents[position - 1]
new_child.previous_sibling = previous_child
new_child.previous_sibling.next_sibling = new_child
new_child.previous_element = previous_child._last_descendant()
new_child.previous_element = previous_child._last_descendant(False)
if new_child.previous_element is not None:
new_child.previous_element.next_element = new_child
new_childs_last_element = new_child._last_descendant()
new_childs_last_element = new_child._last_descendant(False)
if position >= len(self.contents):
new_child.next_sibling = None
@@ -366,7 +443,7 @@ class PageElement(object):
# NOTE: We can't use _find_one because findParents takes a different
# set of arguments.
r = None
l = self.find_parents(name, attrs, 1)
l = self.find_parents(name, attrs, 1, **kwargs)
if l:
r = l[0]
return r
@@ -403,20 +480,21 @@ class PageElement(object):
if isinstance(name, SoupStrainer):
strainer = name
elif text is None and not limit and not attrs and not kwargs:
# Optimization to find all tags.
if name is True or name is None:
return [element for element in generator
if isinstance(element, Tag)]
# Optimization to find all tags with a given name.
elif isinstance(name, basestring):
return [element for element in generator
if isinstance(element, Tag) and element.name == name]
else:
strainer = SoupStrainer(name, attrs, text, **kwargs)
else:
# Build a SoupStrainer
strainer = SoupStrainer(name, attrs, text, **kwargs)
if text is None and not limit and not attrs and not kwargs:
if name is True or name is None:
# Optimization to find all tags.
result = (element for element in generator
if isinstance(element, Tag))
return ResultSet(strainer, result)
elif isinstance(name, basestring):
# Optimization to find all tags with a given name.
result = (element for element in generator
if isinstance(element, Tag)
and element.name == name)
return ResultSet(strainer, result)
results = ResultSet(strainer)
while True:
try:
@@ -495,6 +573,14 @@ class PageElement(object):
value =" ".join(value)
return value
def _tag_name_matches_and(self, function, tag_name):
if not tag_name:
return function
else:
def _match(tag):
return tag.name == tag_name and function(tag)
return _match
def _attribute_checker(self, operator, attribute, value=''):
"""Create a function that performs a CSS selector operation.
@@ -536,87 +622,6 @@ class PageElement(object):
else:
return lambda el: el.has_attr(attribute)
def select(self, selector):
"""Perform a CSS selection operation on the current element."""
tokens = selector.split()
current_context = [self]
for index, token in enumerate(tokens):
if tokens[index - 1] == '>':
# already found direct descendants in last step. skip this
# step.
continue
m = self.attribselect_re.match(token)
if m is not None:
# Attribute selector
tag, attribute, operator, value = m.groups()
if not tag:
tag = True
checker = self._attribute_checker(operator, attribute, value)
found = []
for context in current_context:
found.extend(
[el for el in context.find_all(tag) if checker(el)])
current_context = found
continue
if '#' in token:
# ID selector
tag, id = token.split('#', 1)
if tag == "":
tag = True
el = current_context[0].find(tag, {'id': id})
if el is None:
return [] # No match
current_context = [el]
continue
if '.' in token:
# Class selector
tag_name, klass = token.split('.', 1)
if not tag_name:
tag_name = True
classes = set(klass.split('.'))
found = []
def classes_match(tag):
if tag_name is not True and tag.name != tag_name:
return False
if not tag.has_attr('class'):
return False
return classes.issubset(tag['class'])
for context in current_context:
found.extend(context.find_all(classes_match))
current_context = found
continue
if token == '*':
# Star selector
found = []
for context in current_context:
found.extend(context.findAll(True))
current_context = found
continue
if token == '>':
# Child selector
tag = tokens[index + 1]
if not tag:
tag = True
found = []
for context in current_context:
found.extend(context.find_all(tag, recursive=False))
current_context = found
continue
# Here we should just have a regular tag
if not self.tag_name_re.match(token):
return []
found = []
for context in current_context:
found.extend(context.findAll(token))
current_context = found
return current_context
# Old non-property versions of the generators, for backwards
# compatibility with BS3.
def nextGenerator(self):
@@ -652,6 +657,9 @@ class NavigableString(unicode, PageElement):
return unicode.__new__(cls, value)
return unicode.__new__(cls, value, DEFAULT_OUTPUT_ENCODING)
def __copy__(self):
return self
def __getnewargs__(self):
return (unicode(self),)
@@ -670,6 +678,13 @@ class NavigableString(unicode, PageElement):
output = self.format_string(self, formatter)
return self.PREFIX + output + self.SUFFIX
@property
def name(self):
return None
@name.setter
def name(self, name):
raise AttributeError("A NavigableString cannot be given a name.")
class PreformattedString(NavigableString):
"""A NavigableString not subject to the normal formatting rules.
@@ -709,7 +724,7 @@ class Doctype(PreformattedString):
@classmethod
def for_name_and_ids(cls, name, pub_id, system_id):
value = name
value = name or ''
if pub_id is not None:
value += ' PUBLIC "%s"' % pub_id
if system_id is not None:
@@ -744,7 +759,7 @@ class Tag(PageElement):
self.prefix = prefix
if attrs is None:
attrs = {}
elif builder.cdata_list_attributes:
elif attrs and builder.cdata_list_attributes:
attrs = builder._replace_cdata_list_attribute_values(
self.name, attrs)
else:
@@ -803,16 +818,24 @@ class Tag(PageElement):
self.clear()
self.append(string.__class__(string))
def _all_strings(self, strip=False):
"""Yield all child strings, possibly stripping them."""
def _all_strings(self, strip=False, types=(NavigableString, CData)):
"""Yield all strings of certain classes, possibly stripping them.
By default, yields only NavigableString and CData objects. So
no comments, processing instructions, etc.
"""
for descendant in self.descendants:
if not isinstance(descendant, NavigableString):
if (
(types is None and not isinstance(descendant, NavigableString))
or
(types is not None and type(descendant) not in types)):
continue
if strip:
descendant = descendant.strip()
if len(descendant) == 0:
continue
yield descendant
strings = property(_all_strings)
@property
@@ -820,11 +843,13 @@ class Tag(PageElement):
for string in self._all_strings(True):
yield string
def get_text(self, separator="", strip=False):
def get_text(self, separator=u"", strip=False,
types=(NavigableString, CData)):
"""
Get all child strings, concatenated using the given separator.
"""
return separator.join([s for s in self._all_strings(strip)])
return separator.join([s for s in self._all_strings(
strip, types=types)])
getText = get_text
text = property(get_text)
@@ -835,6 +860,7 @@ class Tag(PageElement):
while i is not None:
next = i.next_element
i.__dict__.clear()
i.contents = []
i = next
def clear(self, decompose=False):
@@ -966,6 +992,13 @@ class Tag(PageElement):
u = self.decode(indent_level, encoding, formatter)
return u.encode(encoding, errors)
def _should_pretty_print(self, indent_level):
"""Should this tag be pretty-printed?"""
return (
indent_level is not None and
(self.name not in HTMLAwareEntitySubstitution.preformatted_tags
or self._is_xml))
def decode(self, indent_level=None,
eventual_encoding=DEFAULT_OUTPUT_ENCODING,
formatter="minimal"):
@@ -978,6 +1011,12 @@ class Tag(PageElement):
document contains a <META> tag that mentions the document's
encoding.
"""
# First off, turn a string formatter into a function. This
# will stop the lookup from happening over and over again.
if not callable(formatter):
formatter = self._formatter_for_name(formatter)
attrs = []
if self.attrs:
for key, val in sorted(self.attrs.items()):
@@ -987,7 +1026,7 @@ class Tag(PageElement):
if isinstance(val, list) or isinstance(val, tuple):
val = ' '.join(val)
elif not isinstance(val, basestring):
val = str(val)
val = unicode(val)
elif (
isinstance(val, AttributeValueWithCharsetSubstitution)
and eventual_encoding is not None):
@@ -995,26 +1034,30 @@ class Tag(PageElement):
text = self.format_string(val, formatter)
decoded = (
str(key) + '='
unicode(key) + '='
+ EntitySubstitution.quoted_attribute_value(text))
attrs.append(decoded)
close = ''
closeTag = ''
if self.is_empty_element:
close = '/'
else:
closeTag = '</%s>' % self.name
prefix = ''
if self.prefix:
prefix = self.prefix + ":"
pretty_print = (indent_level is not None)
if self.is_empty_element:
close = '/'
else:
closeTag = '</%s%s>' % (prefix, self.name)
pretty_print = self._should_pretty_print(indent_level)
space = ''
indent_space = ''
if indent_level is not None:
indent_space = (' ' * (indent_level - 1))
if pretty_print:
space = (' ' * (indent_level - 1))
space = indent_space
indent_contents = indent_level + 1
else:
space = ''
indent_contents = None
contents = self.decode_contents(
indent_contents, eventual_encoding, formatter)
@@ -1027,8 +1070,10 @@ class Tag(PageElement):
attribute_string = ''
if attrs:
attribute_string = ' ' + ' '.join(attrs)
if pretty_print:
s.append(space)
if indent_level is not None:
# Even if this particular tag is not pretty-printed,
# we should indent up to the start of the tag.
s.append(indent_space)
s.append('<%s%s%s%s>' % (
prefix, self.name, attribute_string, close))
if pretty_print:
@@ -1039,7 +1084,10 @@ class Tag(PageElement):
if pretty_print and closeTag:
s.append(space)
s.append(closeTag)
if pretty_print and closeTag and self.next_sibling:
if indent_level is not None and closeTag and self.next_sibling:
# Even if this particular tag is not pretty-printed,
# we're now done with the tag, and we should add a
# newline if appropriate.
s.append("\n")
s = ''.join(s)
return s
@@ -1062,6 +1110,11 @@ class Tag(PageElement):
document contains a <META> tag that mentions the document's
encoding.
"""
# First off, turn a string formatter into a function. This
# will stop the lookup from happening over and over again.
if not callable(formatter):
formatter = self._formatter_for_name(formatter)
pretty_print = (indent_level is not None)
s = []
for c in self:
@@ -1071,13 +1124,13 @@ class Tag(PageElement):
elif isinstance(c, Tag):
s.append(c.decode(indent_level, eventual_encoding,
formatter))
if text and indent_level:
if text and indent_level and not self.name == 'pre':
text = text.strip()
if text:
if pretty_print:
if pretty_print and not self.name == 'pre':
s.append(" " * (indent_level - 1))
s.append(text)
if pretty_print:
if pretty_print and not self.name == 'pre':
s.append("\n")
return ''.join(s)
@@ -1120,6 +1173,7 @@ class Tag(PageElement):
callable that takes a string and returns whether or not the
string matches for some custom definition of 'matches'. The
same is true of the tag name."""
generator = self.descendants
if not recursive:
generator = self.children
@@ -1143,6 +1197,207 @@ class Tag(PageElement):
yield current
current = current.next_element
# CSS selector code
_selector_combinators = ['>', '+', '~']
_select_debug = False
def select(self, selector, _candidate_generator=None):
"""Perform a CSS selection operation on the current element."""
tokens = selector.split()
current_context = [self]
if tokens[-1] in self._selector_combinators:
raise ValueError(
'Final combinator "%s" is missing an argument.' % tokens[-1])
if self._select_debug:
print 'Running CSS selector "%s"' % selector
for index, token in enumerate(tokens):
if self._select_debug:
print ' Considering token "%s"' % token
recursive_candidate_generator = None
tag_name = None
if tokens[index-1] in self._selector_combinators:
# This token was consumed by the previous combinator. Skip it.
if self._select_debug:
print ' Token was consumed by the previous combinator.'
continue
# Each operation corresponds to a checker function, a rule
# for determining whether a candidate matches the
# selector. Candidates are generated by the active
# iterator.
checker = None
m = self.attribselect_re.match(token)
if m is not None:
# Attribute selector
tag_name, attribute, operator, value = m.groups()
checker = self._attribute_checker(operator, attribute, value)
elif '#' in token:
# ID selector
tag_name, tag_id = token.split('#', 1)
def id_matches(tag):
return tag.get('id', None) == tag_id
checker = id_matches
elif '.' in token:
# Class selector
tag_name, klass = token.split('.', 1)
classes = set(klass.split('.'))
def classes_match(candidate):
return classes.issubset(candidate.get('class', []))
checker = classes_match
elif ':' in token:
# Pseudo-class
tag_name, pseudo = token.split(':', 1)
if tag_name == '':
raise ValueError(
"A pseudo-class must be prefixed with a tag name.")
pseudo_attributes = re.match('([a-zA-Z\d-]+)\(([a-zA-Z\d]+)\)', pseudo)
found = []
if pseudo_attributes is not None:
pseudo_type, pseudo_value = pseudo_attributes.groups()
if pseudo_type == 'nth-of-type':
try:
pseudo_value = int(pseudo_value)
except:
raise NotImplementedError(
'Only numeric values are currently supported for the nth-of-type pseudo-class.')
if pseudo_value < 1:
raise ValueError(
'nth-of-type pseudo-class value must be at least 1.')
class Counter(object):
def __init__(self, destination):
self.count = 0
self.destination = destination
def nth_child_of_type(self, tag):
self.count += 1
if self.count == self.destination:
return True
if self.count > self.destination:
# Stop the generator that's sending us
# these things.
raise StopIteration()
return False
checker = Counter(pseudo_value).nth_child_of_type
else:
raise NotImplementedError(
'Only the following pseudo-classes are implemented: nth-of-type.')
elif token == '*':
# Star selector -- matches everything
pass
elif token == '>':
# Run the next token as a CSS selector against the
# direct children of each tag in the current context.
recursive_candidate_generator = lambda tag: tag.children
elif token == '~':
# Run the next token as a CSS selector against the
# siblings of each tag in the current context.
recursive_candidate_generator = lambda tag: tag.next_siblings
elif token == '+':
# For each tag in the current context, run the next
# token as a CSS selector against the tag's next
# sibling that's a tag.
def next_tag_sibling(tag):
yield tag.find_next_sibling(True)
recursive_candidate_generator = next_tag_sibling
elif self.tag_name_re.match(token):
# Just a tag name.
tag_name = token
else:
raise ValueError(
'Unsupported or invalid CSS selector: "%s"' % token)
if recursive_candidate_generator:
# This happens when the selector looks like "> foo".
#
# The generator calls select() recursively on every
# member of the current context, passing in a different
# candidate generator and a different selector.
#
# In the case of "> foo", the candidate generator is
# one that yields a tag's direct children (">"), and
# the selector is "foo".
next_token = tokens[index+1]
def recursive_select(tag):
if self._select_debug:
print ' Calling select("%s") recursively on %s %s' % (next_token, tag.name, tag.attrs)
print '-' * 40
for i in tag.select(next_token, recursive_candidate_generator):
if self._select_debug:
print '(Recursive select picked up candidate %s %s)' % (i.name, i.attrs)
yield i
if self._select_debug:
print '-' * 40
_use_candidate_generator = recursive_select
elif _candidate_generator is None:
# By default, a tag's candidates are all of its
# children. If tag_name is defined, only yield tags
# with that name.
if self._select_debug:
if tag_name:
check = "[any]"
else:
check = tag_name
print ' Default candidate generator, tag name="%s"' % check
if self._select_debug:
# This is redundant with later code, but it stops
# a bunch of bogus tags from cluttering up the
# debug log.
def default_candidate_generator(tag):
for child in tag.descendants:
if not isinstance(child, Tag):
continue
if tag_name and not child.name == tag_name:
continue
yield child
_use_candidate_generator = default_candidate_generator
else:
_use_candidate_generator = lambda tag: tag.descendants
else:
_use_candidate_generator = _candidate_generator
new_context = []
new_context_ids = set([])
for tag in current_context:
if self._select_debug:
print " Running candidate generator on %s %s" % (
tag.name, repr(tag.attrs))
for candidate in _use_candidate_generator(tag):
if not isinstance(candidate, Tag):
continue
if tag_name and candidate.name != tag_name:
continue
if checker is not None:
try:
result = checker(candidate)
except StopIteration:
# The checker has decided we should no longer
# run the generator.
break
if checker is None or result:
if self._select_debug:
print " SUCCESS %s %s" % (candidate.name, repr(candidate.attrs))
if id(candidate) not in new_context_ids:
# If a tag matches a selector more than once,
# don't include it in the context more than once.
new_context.append(candidate)
new_context_ids.add(id(candidate))
elif self._select_debug:
print " FAILURE %s %s" % (candidate.name, repr(candidate.attrs))
current_context = new_context
if self._select_debug:
print "Final verdict:"
for i in current_context:
print " %s %s" % (i.name, i.attrs)
return current_context
# Old names for backwards compatibility
def childGenerator(self):
return self.children
@@ -1150,10 +1405,13 @@ class Tag(PageElement):
def recursiveChildGenerator(self):
return self.descendants
# This was kind of misleading because has_key() (attributes) was
# different from __in__ (contents). has_key() is gone in Python 3,
# anyway.
has_key = has_attr
def has_key(self, key):
"""This was kind of misleading because has_key() (attributes)
was different from __in__ (contents). has_key() is gone in
Python 3, anyway."""
warnings.warn('has_key is deprecated. Use has_attr("%s") instead.' % (
key))
return self.has_attr(key)
# Next, a couple classes to represent queries and their results.
class SoupStrainer(object):
@@ -1168,6 +1426,12 @@ class SoupStrainer(object):
kwargs['class'] = attrs
attrs = None
if 'class_' in kwargs:
# Treat class_="foo" as a search for the 'class'
# attribute, overriding any non-dict value for attrs.
kwargs['class'] = kwargs['class_']
del kwargs['class_']
if kwargs:
if attrs:
attrs = attrs.copy()
@@ -1342,6 +1606,6 @@ class SoupStrainer(object):
class ResultSet(list):
"""A ResultSet is just a list that keeps track of the SoupStrainer
that created it."""
def __init__(self, source):
list.__init__([])
def __init__(self, source, result=()):
super(ResultSet, self).__init__(result)
self.source = source

View File

@@ -81,6 +81,11 @@ class HTMLTreeBuilderSmokeTest(object):
self.assertDoctypeHandled(
'html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"')
def test_empty_doctype(self):
soup = self.soup("<!DOCTYPE>")
doctype = soup.contents[0]
self.assertEqual("", doctype.strip())
def test_public_doctype_with_url(self):
doctype = 'html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"'
self.assertDoctypeHandled(doctype)
@@ -159,6 +164,12 @@ class HTMLTreeBuilderSmokeTest(object):
comment = soup.find(text="foobar")
self.assertEqual(comment.__class__, Comment)
# The comment is properly integrated into the tree.
foo = soup.find(text="foo")
self.assertEqual(comment, foo.next_element)
baz = soup.find(text="baz")
self.assertEqual(comment, baz.previous_element)
def test_preserved_whitespace_in_pre_and_textarea(self):
"""Whitespace must be preserved in <pre> and <textarea> tags."""
self.assertSoupEquals("<pre> </pre>")
@@ -202,6 +213,14 @@ class HTMLTreeBuilderSmokeTest(object):
"<tbody><tr><td>Bar</td></tr></tbody>"
"<tfoot><tr><td>Baz</td></tr></tfoot></table>")
def test_deeply_nested_multivalued_attribute(self):
# html5lib can set the attributes of the same tag many times
# as it rearranges the tree. This has caused problems with
# multivalued attributes.
markup = '<table><div><div class="css"></div></div></table>'
soup = self.soup(markup)
self.assertEqual(["css"], soup.div.div['class'])
def test_angle_brackets_in_attribute_values_are_escaped(self):
self.assertSoupEquals('<a b="<a>"></a>', '<a b="&lt;a&gt;"></a>')
@@ -209,12 +228,14 @@ class HTMLTreeBuilderSmokeTest(object):
expect = u'<p id="pi\N{LATIN SMALL LETTER N WITH TILDE}ata"></p>'
self.assertSoupEquals('<p id="pi&#241;ata"></p>', expect)
self.assertSoupEquals('<p id="pi&#xf1;ata"></p>', expect)
self.assertSoupEquals('<p id="pi&#Xf1;ata"></p>', expect)
self.assertSoupEquals('<p id="pi&ntilde;ata"></p>', expect)
def test_entities_in_text_converted_to_unicode(self):
expect = u'<p>pi\N{LATIN SMALL LETTER N WITH TILDE}ata</p>'
self.assertSoupEquals("<p>pi&#241;ata</p>", expect)
self.assertSoupEquals("<p>pi&#xf1;ata</p>", expect)
self.assertSoupEquals("<p>pi&#Xf1;ata</p>", expect)
self.assertSoupEquals("<p>pi&ntilde;ata</p>", expect)
def test_quot_entity_converted_to_quotation_mark(self):
@@ -227,6 +248,12 @@ class HTMLTreeBuilderSmokeTest(object):
self.assertSoupEquals("&#x10000000000000;", expect)
self.assertSoupEquals("&#1000000000;", expect)
def test_multipart_strings(self):
"Mostly to prevent a recurrence of a bug in the html5lib treebuilder."
soup = self.soup("<html><h2>\nfoo</h2><p></p></html>")
self.assertEqual("p", soup.h2.string.next_element.name)
self.assertEqual("p", soup.p.name)
def test_basic_namespaces(self):
"""Parsers don't need to *understand* namespaces, but at the
very least they should not choke on namespaces or lose
@@ -254,6 +281,14 @@ class HTMLTreeBuilderSmokeTest(object):
# to detect any differences between them.
#
def test_can_parse_unicode_document(self):
# A seemingly innocuous document... but it's in Unicode! And
# it contains characters that can't be represented in the
# encoding found in the declaration! The horror!
markup = u'<html><head><meta encoding="euc-jp"></head><body>Sacr\N{LATIN SMALL LETTER E WITH ACUTE} bleu!</body>'
soup = self.soup(markup)
self.assertEqual(u'Sacr\xe9 bleu!', soup.body.string)
def test_soupstrainer(self):
"""Parsers should be able to work with SoupStrainers."""
strainer = SoupStrainer("b")
@@ -445,6 +480,28 @@ class XMLTreeBuilderSmokeTest(object):
self.assertEqual(
soup.encode("utf-8"), markup)
def test_formatter_processes_script_tag_for_xml_documents(self):
doc = """
<script type="text/javascript">
</script>
"""
soup = BeautifulSoup(doc, "xml")
# lxml would have stripped this while parsing, but we can add
# it later.
soup.script.string = 'console.log("< < hey > > ");'
encoded = soup.encode()
self.assertTrue(b"&lt; &lt; hey &gt; &gt;" in encoded)
def test_can_parse_unicode_document(self):
markup = u'<?xml version="1.0" encoding="euc-jp"><root>Sacr\N{LATIN SMALL LETTER E WITH ACUTE} bleu!</root>'
soup = self.soup(markup)
self.assertEqual(u'Sacr\xe9 bleu!', soup.root.string)
def test_popping_namespaced_tag(self):
markup = '<rss xmlns:dc="foo"><dc:creator>b</dc:creator><dc:date>2012-07-02T20:33:42Z</dc:date><dc:rights>c</dc:rights><image>d</image></rss>'
soup = self.soup(markup)
self.assertEqual(
unicode(soup.rss), markup)
def test_docstring_includes_correct_encoding(self):
soup = self.soup("<root/>")
@@ -472,6 +529,20 @@ class XMLTreeBuilderSmokeTest(object):
self.assertEqual("http://example.com/", root['xmlns:a'])
self.assertEqual("http://example.net/", root['xmlns:b'])
def test_closing_namespaced_tag(self):
markup = '<p xmlns:dc="http://purl.org/dc/elements/1.1/"><dc:date>20010504</dc:date></p>'
soup = self.soup(markup)
self.assertEqual(unicode(soup.p), markup)
def test_namespaced_attributes(self):
markup = '<foo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><bar xsi:schemaLocation="http://www.example.com"/></foo>'
soup = self.soup(markup)
self.assertEqual(unicode(soup.foo), markup)
def test_namespaced_attributes_xml_namespace(self):
markup = '<foo xml:lang="fr">bar</foo>'
soup = self.soup(markup)
self.assertEqual(unicode(soup.foo), markup)
class HTML5TreeBuilderSmokeTest(HTMLTreeBuilderSmokeTest):
"""Smoke test for a tree builder that supports HTML5."""
@@ -501,6 +572,12 @@ class HTML5TreeBuilderSmokeTest(HTMLTreeBuilderSmokeTest):
self.assertEqual(namespace, soup.math.namespace)
self.assertEqual(namespace, soup.msqrt.namespace)
def test_xml_declaration_becomes_comment(self):
markup = '<?xml version="1.0" encoding="utf-8"?><html></html>'
soup = self.soup(markup)
self.assertTrue(isinstance(soup.contents[0], Comment))
self.assertEqual(soup.contents[0], '?xml version="1.0" encoding="utf-8"?')
self.assertEqual("html", soup.contents[0].next_element.name)
def skipIf(condition, reason):
def nothing(test, *args, **kwargs):

View File

@@ -1,509 +0,0 @@
import re
import hashlib
import time
import StringIO
__version__ = '0.8'
#GNTP/<version> <messagetype> <encryptionAlgorithmID>[:<ivValue>][ <keyHashAlgorithmID>:<keyHash>.<salt>]
GNTP_INFO_LINE = re.compile(
'GNTP/(?P<version>\d+\.\d+) (?P<messagetype>REGISTER|NOTIFY|SUBSCRIBE|\-OK|\-ERROR)' +
' (?P<encryptionAlgorithmID>[A-Z0-9]+(:(?P<ivValue>[A-F0-9]+))?) ?' +
'((?P<keyHashAlgorithmID>[A-Z0-9]+):(?P<keyHash>[A-F0-9]+).(?P<salt>[A-F0-9]+))?\r\n',
re.IGNORECASE
)
GNTP_INFO_LINE_SHORT = re.compile(
'GNTP/(?P<version>\d+\.\d+) (?P<messagetype>REGISTER|NOTIFY|SUBSCRIBE|\-OK|\-ERROR)',
re.IGNORECASE
)
GNTP_HEADER = re.compile('([\w-]+):(.+)')
GNTP_EOL = '\r\n'
class BaseError(Exception):
def gntp_error(self):
error = GNTPError(self.errorcode, self.errordesc)
return error.encode()
class ParseError(BaseError):
errorcode = 500
errordesc = 'Error parsing the message'
class AuthError(BaseError):
errorcode = 400
errordesc = 'Error with authorization'
class UnsupportedError(BaseError):
errorcode = 500
errordesc = 'Currently unsupported by gntp.py'
class _GNTPBuffer(StringIO.StringIO):
"""GNTP Buffer class"""
def writefmt(self, message = "", *args):
"""Shortcut function for writing GNTP Headers"""
self.write((message % args).encode('utf8', 'replace'))
self.write(GNTP_EOL)
class _GNTPBase(object):
"""Base initilization
:param string messagetype: GNTP Message type
:param string version: GNTP Protocol version
:param string encription: Encryption protocol
"""
def __init__(self, messagetype = None, version = '1.0', encryption = None):
self.info = {
'version': version,
'messagetype': messagetype,
'encryptionAlgorithmID': encryption
}
self.headers = {}
self.resources = {}
def __str__(self):
return self.encode()
def _parse_info(self, data):
"""Parse the first line of a GNTP message to get security and other info values
:param string data: GNTP Message
:return dict: Parsed GNTP Info line
"""
match = GNTP_INFO_LINE.match(data)
if not match:
raise ParseError('ERROR_PARSING_INFO_LINE')
info = match.groupdict()
if info['encryptionAlgorithmID'] == 'NONE':
info['encryptionAlgorithmID'] = None
return info
def set_password(self, password, encryptAlgo = 'MD5'):
"""Set a password for a GNTP Message
:param string password: Null to clear password
:param string encryptAlgo: Supports MD5, SHA1, SHA256, SHA512
"""
hash = {
'MD5': hashlib.md5,
'SHA1': hashlib.sha1,
'SHA256': hashlib.sha256,
'SHA512': hashlib.sha512,
}
self.password = password
self.encryptAlgo = encryptAlgo.upper()
if not password:
self.info['encryptionAlgorithmID'] = None
self.info['keyHashAlgorithm'] = None
return
if not self.encryptAlgo in hash.keys():
raise UnsupportedError('INVALID HASH "%s"' % self.encryptAlgo)
hashfunction = hash.get(self.encryptAlgo)
password = password.encode('utf8')
seed = time.ctime()
salt = hashfunction(seed).hexdigest()
saltHash = hashfunction(seed).digest()
keyBasis = password + saltHash
key = hashfunction(keyBasis).digest()
keyHash = hashfunction(key).hexdigest()
self.info['keyHashAlgorithmID'] = self.encryptAlgo
self.info['keyHash'] = keyHash.upper()
self.info['salt'] = salt.upper()
def _decode_hex(self, value):
"""Helper function to decode hex string to `proper` hex string
:param string value: Human readable hex string
:return string: Hex string
"""
result = ''
for i in range(0, len(value), 2):
tmp = int(value[i:i + 2], 16)
result += chr(tmp)
return result
def _decode_binary(self, rawIdentifier, identifier):
rawIdentifier += '\r\n\r\n'
dataLength = int(identifier['Length'])
pointerStart = self.raw.find(rawIdentifier) + len(rawIdentifier)
pointerEnd = pointerStart + dataLength
data = self.raw[pointerStart:pointerEnd]
if not len(data) == dataLength:
raise ParseError('INVALID_DATA_LENGTH Expected: %s Recieved %s' % (dataLength, len(data)))
return data
def _validate_password(self, password):
"""Validate GNTP Message against stored password"""
self.password = password
if password == None:
raise AuthError('Missing password')
keyHash = self.info.get('keyHash', None)
if keyHash is None and self.password is None:
return True
if keyHash is None:
raise AuthError('Invalid keyHash')
if self.password is None:
raise AuthError('Missing password')
password = self.password.encode('utf8')
saltHash = self._decode_hex(self.info['salt'])
keyBasis = password + saltHash
key = hashlib.md5(keyBasis).digest()
keyHash = hashlib.md5(key).hexdigest()
if not keyHash.upper() == self.info['keyHash'].upper():
raise AuthError('Invalid Hash')
return True
def validate(self):
"""Verify required headers"""
for header in self._requiredHeaders:
if not self.headers.get(header, False):
raise ParseError('Missing Notification Header: ' + header)
def _format_info(self):
"""Generate info line for GNTP Message
:return string:
"""
info = u'GNTP/%s %s' % (
self.info.get('version'),
self.info.get('messagetype'),
)
if self.info.get('encryptionAlgorithmID', None):
info += ' %s:%s' % (
self.info.get('encryptionAlgorithmID'),
self.info.get('ivValue'),
)
else:
info += ' NONE'
if self.info.get('keyHashAlgorithmID', None):
info += ' %s:%s.%s' % (
self.info.get('keyHashAlgorithmID'),
self.info.get('keyHash'),
self.info.get('salt')
)
return info
def _parse_dict(self, data):
"""Helper function to parse blocks of GNTP headers into a dictionary
:param string data:
:return dict:
"""
dict = {}
for line in data.split('\r\n'):
match = GNTP_HEADER.match(line)
if not match:
continue
key = unicode(match.group(1).strip(), 'utf8', 'replace')
val = unicode(match.group(2).strip(), 'utf8', 'replace')
dict[key] = val
return dict
def add_header(self, key, value):
if isinstance(value, unicode):
self.headers[key] = value
else:
self.headers[key] = unicode('%s' % value, 'utf8', 'replace')
def add_resource(self, data):
"""Add binary resource
:param string data: Binary Data
"""
identifier = hashlib.md5(data).hexdigest()
self.resources[identifier] = data
return 'x-growl-resource://%s' % identifier
def decode(self, data, password = None):
"""Decode GNTP Message
:param string data:
"""
self.password = password
self.raw = data
parts = self.raw.split('\r\n\r\n')
self.info = self._parse_info(data)
self.headers = self._parse_dict(parts[0])
def encode(self):
"""Encode a generic GNTP Message
:return string: GNTP Message ready to be sent
"""
buffer = _GNTPBuffer()
buffer.writefmt(self._format_info())
#Headers
for k, v in self.headers.iteritems():
buffer.writefmt('%s: %s', k, v)
buffer.writefmt()
#Resources
for resource, data in self.resources.iteritems():
buffer.writefmt('Identifier: %s', resource)
buffer.writefmt('Length: %d', len(data))
buffer.writefmt()
buffer.write(data)
buffer.writefmt()
buffer.writefmt()
return buffer.getvalue()
class GNTPRegister(_GNTPBase):
"""Represents a GNTP Registration Command
:param string data: (Optional) See decode()
:param string password: (Optional) Password to use while encoding/decoding messages
"""
_requiredHeaders = [
'Application-Name',
'Notifications-Count'
]
_requiredNotificationHeaders = ['Notification-Name']
def __init__(self, data = None, password = None):
_GNTPBase.__init__(self, 'REGISTER')
self.notifications = []
if data:
self.decode(data, password)
else:
self.set_password(password)
self.add_header('Application-Name', 'pygntp')
self.add_header('Notifications-Count', 0)
def validate(self):
'''Validate required headers and validate notification headers'''
for header in self._requiredHeaders:
if not self.headers.get(header, False):
raise ParseError('Missing Registration Header: ' + header)
for notice in self.notifications:
for header in self._requiredNotificationHeaders:
if not notice.get(header, False):
raise ParseError('Missing Notification Header: ' + header)
def decode(self, data, password):
"""Decode existing GNTP Registration message
:param string data: Message to decode
"""
self.raw = data
parts = self.raw.split('\r\n\r\n')
self.info = self._parse_info(data)
self._validate_password(password)
self.headers = self._parse_dict(parts[0])
for i, part in enumerate(parts):
if i == 0:
continue # Skip Header
if part.strip() == '':
continue
notice = self._parse_dict(part)
if notice.get('Notification-Name', False):
self.notifications.append(notice)
elif notice.get('Identifier', False):
notice['Data'] = self._decode_binary(part, notice)
#open('register.png','wblol').write(notice['Data'])
self.resources[notice.get('Identifier')] = notice
def add_notification(self, name, enabled = True):
"""Add new Notification to Registration message
:param string name: Notification Name
:param boolean enabled: Enable this notification by default
"""
notice = {}
notice['Notification-Name'] = u'%s' % name
notice['Notification-Enabled'] = u'%s' % enabled
self.notifications.append(notice)
self.add_header('Notifications-Count', len(self.notifications))
def encode(self):
"""Encode a GNTP Registration Message
:return string: Encoded GNTP Registration message
"""
buffer = _GNTPBuffer()
buffer.writefmt(self._format_info())
#Headers
for k, v in self.headers.iteritems():
buffer.writefmt('%s: %s', k, v)
buffer.writefmt()
#Notifications
if len(self.notifications) > 0:
for notice in self.notifications:
for k, v in notice.iteritems():
buffer.writefmt('%s: %s', k, v)
buffer.writefmt()
#Resources
for resource, data in self.resources.iteritems():
buffer.writefmt('Identifier: %s', resource)
buffer.writefmt('Length: %d', len(data))
buffer.writefmt()
buffer.write(data)
buffer.writefmt()
buffer.writefmt()
return buffer.getvalue()
class GNTPNotice(_GNTPBase):
"""Represents a GNTP Notification Command
:param string data: (Optional) See decode()
:param string app: (Optional) Set Application-Name
:param string name: (Optional) Set Notification-Name
:param string title: (Optional) Set Notification Title
:param string password: (Optional) Password to use while encoding/decoding messages
"""
_requiredHeaders = [
'Application-Name',
'Notification-Name',
'Notification-Title'
]
def __init__(self, data = None, app = None, name = None, title = None, password = None):
_GNTPBase.__init__(self, 'NOTIFY')
if data:
self.decode(data, password)
else:
self.set_password(password)
if app:
self.add_header('Application-Name', app)
if name:
self.add_header('Notification-Name', name)
if title:
self.add_header('Notification-Title', title)
def decode(self, data, password):
"""Decode existing GNTP Notification message
:param string data: Message to decode.
"""
self.raw = data
parts = self.raw.split('\r\n\r\n')
self.info = self._parse_info(data)
self._validate_password(password)
self.headers = self._parse_dict(parts[0])
for i, part in enumerate(parts):
if i == 0:
continue # Skip Header
if part.strip() == '':
continue
notice = self._parse_dict(part)
if notice.get('Identifier', False):
notice['Data'] = self._decode_binary(part, notice)
#open('notice.png','wblol').write(notice['Data'])
self.resources[notice.get('Identifier')] = notice
class GNTPSubscribe(_GNTPBase):
"""Represents a GNTP Subscribe Command
:param string data: (Optional) See decode()
:param string password: (Optional) Password to use while encoding/decoding messages
"""
_requiredHeaders = [
'Subscriber-ID',
'Subscriber-Name',
]
def __init__(self, data = None, password = None):
_GNTPBase.__init__(self, 'SUBSCRIBE')
if data:
self.decode(data, password)
else:
self.set_password(password)
class GNTPOK(_GNTPBase):
"""Represents a GNTP OK Response
:param string data: (Optional) See _GNTPResponse.decode()
:param string action: (Optional) Set type of action the OK Response is for
"""
_requiredHeaders = ['Response-Action']
def __init__(self, data = None, action = None):
_GNTPBase.__init__(self, '-OK')
if data:
self.decode(data)
if action:
self.add_header('Response-Action', action)
class GNTPError(_GNTPBase):
"""Represents a GNTP Error response
:param string data: (Optional) See _GNTPResponse.decode()
:param string errorcode: (Optional) Error code
:param string errordesc: (Optional) Error Description
"""
_requiredHeaders = ['Error-Code', 'Error-Description']
def __init__(self, data = None, errorcode = None, errordesc = None):
_GNTPBase.__init__(self, '-ERROR')
if data:
self.decode(data)
if errorcode:
self.add_header('Error-Code', errorcode)
self.add_header('Error-Description', errordesc)
def error(self):
return (self.headers.get('Error-Code', None),
self.headers.get('Error-Description', None))
def parse_gntp(data, password = None):
"""Attempt to parse a message as a GNTP message
:param string data: Message to be parsed
:param string password: Optional password to be used to verify the message
"""
match = GNTP_INFO_LINE_SHORT.match(data)
if not match:
raise ParseError('INVALID_GNTP_INFO')
info = match.groupdict()
if info['messagetype'] == 'REGISTER':
return GNTPRegister(data, password = password)
elif info['messagetype'] == 'NOTIFY':
return GNTPNotice(data, password = password)
elif info['messagetype'] == 'SUBSCRIBE':
return GNTPSubscribe(data, password = password)
elif info['messagetype'] == '-OK':
return GNTPOK(data)
elif info['messagetype'] == '-ERROR':
return GNTPError(data)
raise ParseError('INVALID_GNTP_MESSAGE')

141
libs/gntp/cli.py Normal file
View File

@@ -0,0 +1,141 @@
# Copyright: 2013 Paul Traylor
# These sources are released under the terms of the MIT license: see LICENSE
import logging
import os
import sys
from optparse import OptionParser, OptionGroup
from gntp.notifier import GrowlNotifier
from gntp.shim import RawConfigParser
from gntp.version import __version__
DEFAULT_CONFIG = os.path.expanduser('~/.gntp')
config = RawConfigParser({
'hostname': 'localhost',
'password': None,
'port': 23053,
})
config.read([DEFAULT_CONFIG])
if not config.has_section('gntp'):
config.add_section('gntp')
class ClientParser(OptionParser):
def __init__(self):
OptionParser.__init__(self, version="%%prog %s" % __version__)
group = OptionGroup(self, "Network Options")
group.add_option("-H", "--host",
dest="host", default=config.get('gntp', 'hostname'),
help="Specify a hostname to which to send a remote notification. [%default]")
group.add_option("--port",
dest="port", default=config.getint('gntp', 'port'), type="int",
help="port to listen on [%default]")
group.add_option("-P", "--password",
dest='password', default=config.get('gntp', 'password'),
help="Network password")
self.add_option_group(group)
group = OptionGroup(self, "Notification Options")
group.add_option("-n", "--name",
dest="app", default='Python GNTP Test Client',
help="Set the name of the application [%default]")
group.add_option("-s", "--sticky",
dest='sticky', default=False, action="store_true",
help="Make the notification sticky [%default]")
group.add_option("--image",
dest="icon", default=None,
help="Icon for notification (URL or /path/to/file)")
group.add_option("-m", "--message",
dest="message", default=None,
help="Sets the message instead of using stdin")
group.add_option("-p", "--priority",
dest="priority", default=0, type="int",
help="-2 to 2 [%default]")
group.add_option("-d", "--identifier",
dest="identifier",
help="Identifier for coalescing")
group.add_option("-t", "--title",
dest="title", default=None,
help="Set the title of the notification [%default]")
group.add_option("-N", "--notification",
dest="name", default='Notification',
help="Set the notification name [%default]")
group.add_option("--callback",
dest="callback",
help="URL callback")
self.add_option_group(group)
# Extra Options
self.add_option('-v', '--verbose',
dest='verbose', default=0, action='count',
help="Verbosity levels")
def parse_args(self, args=None, values=None):
values, args = OptionParser.parse_args(self, args, values)
if values.message is None:
print('Enter a message followed by Ctrl-D')
try:
message = sys.stdin.read()
except KeyboardInterrupt:
exit()
else:
message = values.message
if values.title is None:
values.title = ' '.join(args)
# If we still have an empty title, use the
# first bit of the message as the title
if values.title == '':
values.title = message[:20]
values.verbose = logging.WARNING - values.verbose * 10
return values, message
def main():
(options, message) = ClientParser().parse_args()
logging.basicConfig(level=options.verbose)
if not os.path.exists(DEFAULT_CONFIG):
logging.info('No config read found at %s', DEFAULT_CONFIG)
growl = GrowlNotifier(
applicationName=options.app,
notifications=[options.name],
defaultNotifications=[options.name],
hostname=options.host,
password=options.password,
port=options.port,
)
result = growl.register()
if result is not True:
exit(result)
# This would likely be better placed within the growl notifier
# class but until I make _checkIcon smarter this is "easier"
if options.icon is not None and not options.icon.startswith('http'):
logging.info('Loading image %s', options.icon)
f = open(options.icon)
options.icon = f.read()
f.close()
result = growl.notify(
noteType=options.name,
title=options.title,
description=message,
icon=options.icon,
sticky=options.sticky,
priority=options.priority,
callback=options.callback,
identifier=options.identifier,
)
if result is not True:
exit(result)
if __name__ == "__main__":
main()

77
libs/gntp/config.py Normal file
View File

@@ -0,0 +1,77 @@
# Copyright: 2013 Paul Traylor
# These sources are released under the terms of the MIT license: see LICENSE
"""
The gntp.config module is provided as an extended GrowlNotifier object that takes
advantage of the ConfigParser module to allow us to setup some default values
(such as hostname, password, and port) in a more global way to be shared among
programs using gntp
"""
import logging
import os
import gntp.notifier
import gntp.shim
__all__ = [
'mini',
'GrowlNotifier'
]
logger = logging.getLogger(__name__)
class GrowlNotifier(gntp.notifier.GrowlNotifier):
"""
ConfigParser enhanced GrowlNotifier object
For right now, we are only interested in letting users overide certain
values from ~/.gntp
::
[gntp]
hostname = ?
password = ?
port = ?
"""
def __init__(self, *args, **kwargs):
config = gntp.shim.RawConfigParser({
'hostname': kwargs.get('hostname', 'localhost'),
'password': kwargs.get('password'),
'port': kwargs.get('port', 23053),
})
config.read([os.path.expanduser('~/.gntp')])
# If the file does not exist, then there will be no gntp section defined
# and the config.get() lines below will get confused. Since we are not
# saving the config, it should be safe to just add it here so the
# code below doesn't complain
if not config.has_section('gntp'):
logger.info('Error reading ~/.gntp config file')
config.add_section('gntp')
kwargs['password'] = config.get('gntp', 'password')
kwargs['hostname'] = config.get('gntp', 'hostname')
kwargs['port'] = config.getint('gntp', 'port')
super(GrowlNotifier, self).__init__(*args, **kwargs)
def mini(description, **kwargs):
"""Single notification function
Simple notification function in one line. Has only one required parameter
and attempts to use reasonable defaults for everything else
:param string description: Notification message
"""
kwargs['notifierFactory'] = GrowlNotifier
gntp.notifier.mini(description, **kwargs)
if __name__ == '__main__':
# If we're running this module directly we're likely running it as a test
# so extra debugging is useful
logging.basicConfig(level=logging.INFO)
mini('Testing mini notification')

511
libs/gntp/core.py Normal file
View File

@@ -0,0 +1,511 @@
# Copyright: 2013 Paul Traylor
# These sources are released under the terms of the MIT license: see LICENSE
import hashlib
import re
import time
import gntp.shim
import gntp.errors as errors
__all__ = [
'GNTPRegister',
'GNTPNotice',
'GNTPSubscribe',
'GNTPOK',
'GNTPError',
'parse_gntp',
]
#GNTP/<version> <messagetype> <encryptionAlgorithmID>[:<ivValue>][ <keyHashAlgorithmID>:<keyHash>.<salt>]
GNTP_INFO_LINE = re.compile(
'GNTP/(?P<version>\d+\.\d+) (?P<messagetype>REGISTER|NOTIFY|SUBSCRIBE|\-OK|\-ERROR)' +
' (?P<encryptionAlgorithmID>[A-Z0-9]+(:(?P<ivValue>[A-F0-9]+))?) ?' +
'((?P<keyHashAlgorithmID>[A-Z0-9]+):(?P<keyHash>[A-F0-9]+).(?P<salt>[A-F0-9]+))?\r\n',
re.IGNORECASE
)
GNTP_INFO_LINE_SHORT = re.compile(
'GNTP/(?P<version>\d+\.\d+) (?P<messagetype>REGISTER|NOTIFY|SUBSCRIBE|\-OK|\-ERROR)',
re.IGNORECASE
)
GNTP_HEADER = re.compile('([\w-]+):(.+)')
GNTP_EOL = gntp.shim.b('\r\n')
GNTP_SEP = gntp.shim.b(': ')
class _GNTPBuffer(gntp.shim.StringIO):
"""GNTP Buffer class"""
def writeln(self, value=None):
if value:
self.write(gntp.shim.b(value))
self.write(GNTP_EOL)
def writeheader(self, key, value):
if not isinstance(value, str):
value = str(value)
self.write(gntp.shim.b(key))
self.write(GNTP_SEP)
self.write(gntp.shim.b(value))
self.write(GNTP_EOL)
class _GNTPBase(object):
"""Base initilization
:param string messagetype: GNTP Message type
:param string version: GNTP Protocol version
:param string encription: Encryption protocol
"""
def __init__(self, messagetype=None, version='1.0', encryption=None):
self.info = {
'version': version,
'messagetype': messagetype,
'encryptionAlgorithmID': encryption
}
self.hash_algo = {
'MD5': hashlib.md5,
'SHA1': hashlib.sha1,
'SHA256': hashlib.sha256,
'SHA512': hashlib.sha512,
}
self.headers = {}
self.resources = {}
def __str__(self):
return self.encode()
def _parse_info(self, data):
"""Parse the first line of a GNTP message to get security and other info values
:param string data: GNTP Message
:return dict: Parsed GNTP Info line
"""
match = GNTP_INFO_LINE.match(data)
if not match:
raise errors.ParseError('ERROR_PARSING_INFO_LINE')
info = match.groupdict()
if info['encryptionAlgorithmID'] == 'NONE':
info['encryptionAlgorithmID'] = None
return info
def set_password(self, password, encryptAlgo='MD5'):
"""Set a password for a GNTP Message
:param string password: Null to clear password
:param string encryptAlgo: Supports MD5, SHA1, SHA256, SHA512
"""
if not password:
self.info['encryptionAlgorithmID'] = None
self.info['keyHashAlgorithm'] = None
return
self.password = gntp.shim.b(password)
self.encryptAlgo = encryptAlgo.upper()
if not self.encryptAlgo in self.hash_algo:
raise errors.UnsupportedError('INVALID HASH "%s"' % self.encryptAlgo)
hashfunction = self.hash_algo.get(self.encryptAlgo)
password = password.encode('utf8')
seed = time.ctime().encode('utf8')
salt = hashfunction(seed).hexdigest()
saltHash = hashfunction(seed).digest()
keyBasis = password + saltHash
key = hashfunction(keyBasis).digest()
keyHash = hashfunction(key).hexdigest()
self.info['keyHashAlgorithmID'] = self.encryptAlgo
self.info['keyHash'] = keyHash.upper()
self.info['salt'] = salt.upper()
def _decode_hex(self, value):
"""Helper function to decode hex string to `proper` hex string
:param string value: Human readable hex string
:return string: Hex string
"""
result = ''
for i in range(0, len(value), 2):
tmp = int(value[i:i + 2], 16)
result += chr(tmp)
return result
def _decode_binary(self, rawIdentifier, identifier):
rawIdentifier += '\r\n\r\n'
dataLength = int(identifier['Length'])
pointerStart = self.raw.find(rawIdentifier) + len(rawIdentifier)
pointerEnd = pointerStart + dataLength
data = self.raw[pointerStart:pointerEnd]
if not len(data) == dataLength:
raise errors.ParseError('INVALID_DATA_LENGTH Expected: %s Recieved %s' % (dataLength, len(data)))
return data
def _validate_password(self, password):
"""Validate GNTP Message against stored password"""
self.password = password
if password is None:
raise errors.AuthError('Missing password')
keyHash = self.info.get('keyHash', None)
if keyHash is None and self.password is None:
return True
if keyHash is None:
raise errors.AuthError('Invalid keyHash')
if self.password is None:
raise errors.AuthError('Missing password')
keyHashAlgorithmID = self.info.get('keyHashAlgorithmID','MD5')
password = self.password.encode('utf8')
saltHash = self._decode_hex(self.info['salt'])
keyBasis = password + saltHash
self.key = self.hash_algo[keyHashAlgorithmID](keyBasis).digest()
keyHash = self.hash_algo[keyHashAlgorithmID](self.key).hexdigest()
if not keyHash.upper() == self.info['keyHash'].upper():
raise errors.AuthError('Invalid Hash')
return True
def validate(self):
"""Verify required headers"""
for header in self._requiredHeaders:
if not self.headers.get(header, False):
raise errors.ParseError('Missing Notification Header: ' + header)
def _format_info(self):
"""Generate info line for GNTP Message
:return string:
"""
info = 'GNTP/%s %s' % (
self.info.get('version'),
self.info.get('messagetype'),
)
if self.info.get('encryptionAlgorithmID', None):
info += ' %s:%s' % (
self.info.get('encryptionAlgorithmID'),
self.info.get('ivValue'),
)
else:
info += ' NONE'
if self.info.get('keyHashAlgorithmID', None):
info += ' %s:%s.%s' % (
self.info.get('keyHashAlgorithmID'),
self.info.get('keyHash'),
self.info.get('salt')
)
return info
def _parse_dict(self, data):
"""Helper function to parse blocks of GNTP headers into a dictionary
:param string data:
:return dict: Dictionary of parsed GNTP Headers
"""
d = {}
for line in data.split('\r\n'):
match = GNTP_HEADER.match(line)
if not match:
continue
key = match.group(1).strip()
val = match.group(2).strip()
d[key] = val
return d
def add_header(self, key, value):
self.headers[key] = value
def add_resource(self, data):
"""Add binary resource
:param string data: Binary Data
"""
data = gntp.shim.b(data)
identifier = hashlib.md5(data).hexdigest()
self.resources[identifier] = data
return 'x-growl-resource://%s' % identifier
def decode(self, data, password=None):
"""Decode GNTP Message
:param string data:
"""
self.password = password
self.raw = gntp.shim.u(data)
parts = self.raw.split('\r\n\r\n')
self.info = self._parse_info(self.raw)
self.headers = self._parse_dict(parts[0])
def encode(self):
"""Encode a generic GNTP Message
:return string: GNTP Message ready to be sent. Returned as a byte string
"""
buff = _GNTPBuffer()
buff.writeln(self._format_info())
#Headers
for k, v in self.headers.items():
buff.writeheader(k, v)
buff.writeln()
#Resources
for resource, data in self.resources.items():
buff.writeheader('Identifier', resource)
buff.writeheader('Length', len(data))
buff.writeln()
buff.write(data)
buff.writeln()
buff.writeln()
return buff.getvalue()
class GNTPRegister(_GNTPBase):
"""Represents a GNTP Registration Command
:param string data: (Optional) See decode()
:param string password: (Optional) Password to use while encoding/decoding messages
"""
_requiredHeaders = [
'Application-Name',
'Notifications-Count'
]
_requiredNotificationHeaders = ['Notification-Name']
def __init__(self, data=None, password=None):
_GNTPBase.__init__(self, 'REGISTER')
self.notifications = []
if data:
self.decode(data, password)
else:
self.set_password(password)
self.add_header('Application-Name', 'pygntp')
self.add_header('Notifications-Count', 0)
def validate(self):
'''Validate required headers and validate notification headers'''
for header in self._requiredHeaders:
if not self.headers.get(header, False):
raise errors.ParseError('Missing Registration Header: ' + header)
for notice in self.notifications:
for header in self._requiredNotificationHeaders:
if not notice.get(header, False):
raise errors.ParseError('Missing Notification Header: ' + header)
def decode(self, data, password):
"""Decode existing GNTP Registration message
:param string data: Message to decode
"""
self.raw = gntp.shim.u(data)
parts = self.raw.split('\r\n\r\n')
self.info = self._parse_info(self.raw)
self._validate_password(password)
self.headers = self._parse_dict(parts[0])
for i, part in enumerate(parts):
if i == 0:
continue # Skip Header
if part.strip() == '':
continue
notice = self._parse_dict(part)
if notice.get('Notification-Name', False):
self.notifications.append(notice)
elif notice.get('Identifier', False):
notice['Data'] = self._decode_binary(part, notice)
#open('register.png','wblol').write(notice['Data'])
self.resources[notice.get('Identifier')] = notice
def add_notification(self, name, enabled=True):
"""Add new Notification to Registration message
:param string name: Notification Name
:param boolean enabled: Enable this notification by default
"""
notice = {}
notice['Notification-Name'] = name
notice['Notification-Enabled'] = enabled
self.notifications.append(notice)
self.add_header('Notifications-Count', len(self.notifications))
def encode(self):
"""Encode a GNTP Registration Message
:return string: Encoded GNTP Registration message. Returned as a byte string
"""
buff = _GNTPBuffer()
buff.writeln(self._format_info())
#Headers
for k, v in self.headers.items():
buff.writeheader(k, v)
buff.writeln()
#Notifications
if len(self.notifications) > 0:
for notice in self.notifications:
for k, v in notice.items():
buff.writeheader(k, v)
buff.writeln()
#Resources
for resource, data in self.resources.items():
buff.writeheader('Identifier', resource)
buff.writeheader('Length', len(data))
buff.writeln()
buff.write(data)
buff.writeln()
buff.writeln()
return buff.getvalue()
class GNTPNotice(_GNTPBase):
"""Represents a GNTP Notification Command
:param string data: (Optional) See decode()
:param string app: (Optional) Set Application-Name
:param string name: (Optional) Set Notification-Name
:param string title: (Optional) Set Notification Title
:param string password: (Optional) Password to use while encoding/decoding messages
"""
_requiredHeaders = [
'Application-Name',
'Notification-Name',
'Notification-Title'
]
def __init__(self, data=None, app=None, name=None, title=None, password=None):
_GNTPBase.__init__(self, 'NOTIFY')
if data:
self.decode(data, password)
else:
self.set_password(password)
if app:
self.add_header('Application-Name', app)
if name:
self.add_header('Notification-Name', name)
if title:
self.add_header('Notification-Title', title)
def decode(self, data, password):
"""Decode existing GNTP Notification message
:param string data: Message to decode.
"""
self.raw = gntp.shim.u(data)
parts = self.raw.split('\r\n\r\n')
self.info = self._parse_info(self.raw)
self._validate_password(password)
self.headers = self._parse_dict(parts[0])
for i, part in enumerate(parts):
if i == 0:
continue # Skip Header
if part.strip() == '':
continue
notice = self._parse_dict(part)
if notice.get('Identifier', False):
notice['Data'] = self._decode_binary(part, notice)
#open('notice.png','wblol').write(notice['Data'])
self.resources[notice.get('Identifier')] = notice
class GNTPSubscribe(_GNTPBase):
"""Represents a GNTP Subscribe Command
:param string data: (Optional) See decode()
:param string password: (Optional) Password to use while encoding/decoding messages
"""
_requiredHeaders = [
'Subscriber-ID',
'Subscriber-Name',
]
def __init__(self, data=None, password=None):
_GNTPBase.__init__(self, 'SUBSCRIBE')
if data:
self.decode(data, password)
else:
self.set_password(password)
class GNTPOK(_GNTPBase):
"""Represents a GNTP OK Response
:param string data: (Optional) See _GNTPResponse.decode()
:param string action: (Optional) Set type of action the OK Response is for
"""
_requiredHeaders = ['Response-Action']
def __init__(self, data=None, action=None):
_GNTPBase.__init__(self, '-OK')
if data:
self.decode(data)
if action:
self.add_header('Response-Action', action)
class GNTPError(_GNTPBase):
"""Represents a GNTP Error response
:param string data: (Optional) See _GNTPResponse.decode()
:param string errorcode: (Optional) Error code
:param string errordesc: (Optional) Error Description
"""
_requiredHeaders = ['Error-Code', 'Error-Description']
def __init__(self, data=None, errorcode=None, errordesc=None):
_GNTPBase.__init__(self, '-ERROR')
if data:
self.decode(data)
if errorcode:
self.add_header('Error-Code', errorcode)
self.add_header('Error-Description', errordesc)
def error(self):
return (self.headers.get('Error-Code', None),
self.headers.get('Error-Description', None))
def parse_gntp(data, password=None):
"""Attempt to parse a message as a GNTP message
:param string data: Message to be parsed
:param string password: Optional password to be used to verify the message
"""
data = gntp.shim.u(data)
match = GNTP_INFO_LINE_SHORT.match(data)
if not match:
raise errors.ParseError('INVALID_GNTP_INFO')
info = match.groupdict()
if info['messagetype'] == 'REGISTER':
return GNTPRegister(data, password=password)
elif info['messagetype'] == 'NOTIFY':
return GNTPNotice(data, password=password)
elif info['messagetype'] == 'SUBSCRIBE':
return GNTPSubscribe(data, password=password)
elif info['messagetype'] == '-OK':
return GNTPOK(data)
elif info['messagetype'] == '-ERROR':
return GNTPError(data)
raise errors.ParseError('INVALID_GNTP_MESSAGE')

25
libs/gntp/errors.py Normal file
View File

@@ -0,0 +1,25 @@
# Copyright: 2013 Paul Traylor
# These sources are released under the terms of the MIT license: see LICENSE
class BaseError(Exception):
pass
class ParseError(BaseError):
errorcode = 500
errordesc = 'Error parsing the message'
class AuthError(BaseError):
errorcode = 400
errordesc = 'Error with authorization'
class UnsupportedError(BaseError):
errorcode = 500
errordesc = 'Currently unsupported by gntp.py'
class NetworkError(BaseError):
errorcode = 500
errordesc = "Error connecting to growl server"

View File

@@ -1,3 +1,6 @@
# Copyright: 2013 Paul Traylor
# These sources are released under the terms of the MIT license: see LICENSE
"""
The gntp.notifier module is provided as a simple way to send notifications
using GNTP
@@ -9,10 +12,15 @@ using GNTP
`Original Python bindings <http://code.google.com/p/growl/source/browse/Bindings/python/Growl.py>`_
"""
import gntp
import socket
import logging
import platform
import socket
import sys
from gntp.version import __version__
import gntp.core
import gntp.errors as errors
import gntp.shim
__all__ = [
'mini',
@@ -37,9 +45,9 @@ class GrowlNotifier(object):
passwordHash = 'MD5'
socketTimeout = 3
def __init__(self, applicationName = 'Python GNTP', notifications = [],
defaultNotifications = None, applicationIcon = None, hostname = 'localhost',
password = None, port = 23053):
def __init__(self, applicationName='Python GNTP', notifications=[],
defaultNotifications=None, applicationIcon=None, hostname='localhost',
password=None, port=23053):
self.applicationName = applicationName
self.notifications = list(notifications)
@@ -61,7 +69,7 @@ class GrowlNotifier(object):
then we return False
'''
logger.info('Checking icon')
return data.startswith('http')
return gntp.shim.u(data).startswith('http')
def register(self):
"""Send GNTP Registration
@@ -71,7 +79,7 @@ class GrowlNotifier(object):
sent a registration message at least once
"""
logger.info('Sending registration to %s:%s', self.hostname, self.port)
register = gntp.GNTPRegister()
register = gntp.core.GNTPRegister()
register.add_header('Application-Name', self.applicationName)
for notification in self.notifications:
enabled = notification in self.defaultNotifications
@@ -80,16 +88,16 @@ class GrowlNotifier(object):
if self._checkIcon(self.applicationIcon):
register.add_header('Application-Icon', self.applicationIcon)
else:
id = register.add_resource(self.applicationIcon)
register.add_header('Application-Icon', id)
resource = register.add_resource(self.applicationIcon)
register.add_header('Application-Icon', resource)
if self.password:
register.set_password(self.password, self.passwordHash)
self.add_origin_info(register)
self.register_hook(register)
return self._send('register', register)
def notify(self, noteType, title, description, icon = None, sticky = False,
priority = None, callback = None, identifier = None):
def notify(self, noteType, title, description, icon=None, sticky=False,
priority=None, callback=None, identifier=None, custom={}):
"""Send a GNTP notifications
.. warning::
@@ -102,6 +110,8 @@ class GrowlNotifier(object):
:param boolean sticky: Sticky notification
:param integer priority: Message priority level from -2 to 2
:param string callback: URL callback
:param dict custom: Custom attributes. Key names should be prefixed with X-
according to the spec but this is not enforced by this class
.. warning::
For now, only URL callbacks are supported. In the future, the
@@ -109,7 +119,7 @@ class GrowlNotifier(object):
"""
logger.info('Sending notification [%s] to %s:%s', noteType, self.hostname, self.port)
assert noteType in self.notifications
notice = gntp.GNTPNotice()
notice = gntp.core.GNTPNotice()
notice.add_header('Application-Name', self.applicationName)
notice.add_header('Notification-Name', noteType)
notice.add_header('Notification-Title', title)
@@ -123,8 +133,8 @@ class GrowlNotifier(object):
if self._checkIcon(icon):
notice.add_header('Notification-Icon', icon)
else:
id = notice.add_resource(icon)
notice.add_header('Notification-Icon', id)
resource = notice.add_resource(icon)
notice.add_header('Notification-Icon', resource)
if description:
notice.add_header('Notification-Text', description)
@@ -133,6 +143,9 @@ class GrowlNotifier(object):
if identifier:
notice.add_header('Notification-Coalescing-ID', identifier)
for key in custom:
notice.add_header(key, custom[key])
self.add_origin_info(notice)
self.notify_hook(notice)
@@ -140,7 +153,7 @@ class GrowlNotifier(object):
def subscribe(self, id, name, port):
"""Send a Subscribe request to a remote machine"""
sub = gntp.GNTPSubscribe()
sub = gntp.core.GNTPSubscribe()
sub.add_header('Subscriber-ID', id)
sub.add_header('Subscriber-Name', name)
sub.add_header('Subscriber-Port', port)
@@ -156,7 +169,7 @@ class GrowlNotifier(object):
"""Add optional Origin headers to message"""
packet.add_header('Origin-Machine-Name', platform.node())
packet.add_header('Origin-Software-Name', 'gntp.py')
packet.add_header('Origin-Software-Version', gntp.__version__)
packet.add_header('Origin-Software-Version', __version__)
packet.add_header('Origin-Platform-Name', platform.system())
packet.add_header('Origin-Platform-Version', platform.platform())
@@ -179,27 +192,33 @@ class GrowlNotifier(object):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(self.socketTimeout)
s.connect((self.hostname, self.port))
s.send(data)
recv_data = s.recv(1024)
while not recv_data.endswith("\r\n\r\n"):
recv_data += s.recv(1024)
response = gntp.parse_gntp(recv_data)
try:
s.connect((self.hostname, self.port))
s.send(data)
recv_data = s.recv(1024)
while not recv_data.endswith(gntp.shim.b("\r\n\r\n")):
recv_data += s.recv(1024)
except socket.error:
# Python2.5 and Python3 compatibile exception
exc = sys.exc_info()[1]
raise errors.NetworkError(exc)
response = gntp.core.parse_gntp(recv_data)
s.close()
logger.debug('From : %s:%s <%s>\n%s', self.hostname, self.port, response.__class__, response)
if type(response) == gntp.GNTPOK:
if type(response) == gntp.core.GNTPOK:
return True
logger.error('Invalid response: %s', response.error())
return response.error()
def mini(description, applicationName = 'PythonMini', noteType = "Message",
title = "Mini Message", applicationIcon = None, hostname = 'localhost',
password = None, port = 23053, sticky = False, priority = None,
callback = None, notificationIcon = None, identifier = None,
notifierFactory = GrowlNotifier):
def mini(description, applicationName='PythonMini', noteType="Message",
title="Mini Message", applicationIcon=None, hostname='localhost',
password=None, port=23053, sticky=False, priority=None,
callback=None, notificationIcon=None, identifier=None,
notifierFactory=GrowlNotifier):
"""Single notification function
Simple notification function in one line. Has only one required parameter
@@ -210,32 +229,37 @@ def mini(description, applicationName = 'PythonMini', noteType = "Message",
For now, only URL callbacks are supported. In the future, the
callback argument will also support a function
"""
growl = notifierFactory(
applicationName = applicationName,
notifications = [noteType],
defaultNotifications = [noteType],
applicationIcon = applicationIcon,
hostname = hostname,
password = password,
port = port,
)
result = growl.register()
if result is not True:
return result
try:
growl = notifierFactory(
applicationName=applicationName,
notifications=[noteType],
defaultNotifications=[noteType],
applicationIcon=applicationIcon,
hostname=hostname,
password=password,
port=port,
)
result = growl.register()
if result is not True:
return result
return growl.notify(
noteType = noteType,
title = title,
description = description,
icon = notificationIcon,
sticky = sticky,
priority = priority,
callback = callback,
identifier = identifier,
)
return growl.notify(
noteType=noteType,
title=title,
description=description,
icon=notificationIcon,
sticky=sticky,
priority=priority,
callback=callback,
identifier=identifier,
)
except Exception:
# We want the "mini" function to be simple and swallow Exceptions
# in order to be less invasive
logger.exception("Growl error")
if __name__ == '__main__':
# If we're running this module directly we're likely running it as a test
# so extra debugging is useful
logging.basicConfig(level = logging.INFO)
logging.basicConfig(level=logging.INFO)
mini('Testing mini notification')

45
libs/gntp/shim.py Normal file
View File

@@ -0,0 +1,45 @@
# Copyright: 2013 Paul Traylor
# These sources are released under the terms of the MIT license: see LICENSE
"""
Python2.5 and Python3.3 compatibility shim
Heavily inspirted by the "six" library.
https://pypi.python.org/pypi/six
"""
import sys
PY3 = sys.version_info[0] == 3
if PY3:
def b(s):
if isinstance(s, bytes):
return s
return s.encode('utf8', 'replace')
def u(s):
if isinstance(s, bytes):
return s.decode('utf8', 'replace')
return s
from io import BytesIO as StringIO
from configparser import RawConfigParser
else:
def b(s):
if isinstance(s, unicode):
return s.encode('utf8', 'replace')
return s
def u(s):
if isinstance(s, unicode):
return s
if isinstance(s, int):
s = str(s)
return unicode(s, "utf8", "replace")
from StringIO import StringIO
from ConfigParser import RawConfigParser
b.__doc__ = "Ensure we have a byte string"
u.__doc__ = "Ensure we have a unicode string"

4
libs/gntp/version.py Normal file
View File

@@ -0,0 +1,4 @@
# Copyright: 2013 Paul Traylor
# These sources are released under the terms of the MIT license: see LICENSE
__version__ = '1.0.2'

View File

@@ -20,7 +20,7 @@
from __future__ import unicode_literals
__version__ = '0.7-dev'
__version__ = '0.6.2'
__all__ = ['Guess', 'Language',
'guess_file_info', 'guess_video_info',
'guess_movie_info', 'guess_episode_info']
@@ -76,6 +76,7 @@ from guessit.language import Language
from guessit.matcher import IterativeMatcher
from guessit.textutils import clean_string
import logging
import json
log = logging.getLogger(__name__)
@@ -105,17 +106,74 @@ def _guess_filename(filename, filetype):
mtree = IterativeMatcher(filename, filetype=filetype)
m = mtree.matched()
second_pass_opts = []
second_pass_transfo_opts = {}
# if there are multiple possible years found, we assume the first one is
# part of the title, reparse the tree taking this into account
years = set(n.value for n in find_nodes(mtree.match_tree, 'year'))
if len(years) >= 2:
mtree = IterativeMatcher(filename, filetype=filetype,
opts=['skip_first_year'])
second_pass_opts.append('skip_first_year')
to_skip_language_nodes = []
title_nodes = set(n for n in find_nodes(mtree.match_tree, ['title', 'series']))
title_spans = {}
for title_node in title_nodes:
title_spans[title_node.span[0]] = title_node
title_spans[title_node.span[1]] = title_node
for lang_key in ('language', 'subtitleLanguage'):
langs = {}
lang_nodes = set(n for n in find_nodes(mtree.match_tree, lang_key))
for lang_node in lang_nodes:
lang = lang_node.guess.get(lang_key, None)
if len(lang_node.value) > 3 and (lang_node.span[0] in title_spans.keys() or lang_node.span[1] in title_spans.keys()):
# Language is next or before title, and is not a language code. Add to skip for 2nd pass.
# if filetype is subtitle and the language appears last, just before
# the extension, then it is likely a subtitle language
parts = clean_string(lang_node.root.value).split()
if m['type'] in ['moviesubtitle', 'episodesubtitle'] and (parts.index(lang_node.value) == len(parts) - 2):
continue
to_skip_language_nodes.append(lang_node)
elif not lang in langs:
langs[lang] = lang_node
else:
# The same language was found. Keep the more confident one, and add others to skip for 2nd pass.
existing_lang_node = langs[lang]
to_skip = None
if existing_lang_node.guess.confidence('language') >= lang_node.guess.confidence('language'):
# lang_node is to remove
to_skip = lang_node
else:
# existing_lang_node is to remove
langs[lang] = lang_node
to_skip = existing_lang_node
to_skip_language_nodes.append(to_skip)
if to_skip_language_nodes:
second_pass_transfo_opts['guess_language'] = (
((), { 'skip': [ { 'node_idx': node.parent.node_idx,
'span': node.span }
for node in to_skip_language_nodes ] }))
if second_pass_opts or second_pass_transfo_opts:
# 2nd pass is needed
log.info("Running 2nd pass with options: %s" % second_pass_opts)
log.info("Transfo options: %s" % second_pass_transfo_opts)
mtree = IterativeMatcher(filename, filetype=filetype,
opts=second_pass_opts,
transfo_opts=second_pass_transfo_opts)
m = mtree.matched()
if 'language' not in m and 'subtitleLanguage' not in m:
if 'language' not in m and 'subtitleLanguage' not in m or 'title' not in m:
return m
# if we found some language, make sure we didn't cut a title or sth...
@@ -123,51 +181,10 @@ def _guess_filename(filename, filetype):
opts=['nolanguage', 'nocountry'])
m2 = mtree2.matched()
if m.get('title') is None:
return m
if m.get('title') != m2.get('title'):
title = next(find_nodes(mtree.match_tree, 'title'))
title2 = next(find_nodes(mtree2.match_tree, 'title'))
langs = list(find_nodes(mtree.match_tree, ['language', 'subtitleLanguage']))
if not langs:
return warning('A weird error happened with language detection')
# find the language that is likely more relevant
for lng in langs:
if lng.value in title2.value:
# if the language was detected as part of a potential title,
# look at this one in particular
lang = lng
break
else:
# pick the first one if we don't have a better choice
lang = langs[0]
# language code are rarely part of a title, and those
# should be handled by the Language exceptions anyway
if len(lang.value) <= 3:
return m
# if filetype is subtitle and the language appears last, just before
# the extension, then it is likely a subtitle language
parts = clean_string(title.root.value).split()
if (m['type'] in ['moviesubtitle', 'episodesubtitle'] and
parts.index(lang.value) == len(parts) - 2):
return m
# if the language was in the middle of the other potential title,
# keep the other title (eg: The Italian Job), except if it is at the
# very beginning, in which case we consider it an error
if m2['title'].startswith(lang.value):
return m
elif lang.value in title2.value:
return m2
# if a node is in an explicit group, then the correct title is probably
# the other one
if title.root.node_at(title.node_idx[:2]).is_explicit():
@@ -175,9 +192,6 @@ def _guess_filename(filename, filetype):
elif title2.root.node_at(title2.node_idx[:2]).is_explicit():
return m
return warning('Not sure of the title because of the language position')
return m

View File

@@ -24,16 +24,19 @@ from guessit import u
from guessit import slogging, guess_file_info
from optparse import OptionParser
import logging
import sys
import os
import locale
def detect_filename(filename, filetype, info=['filename']):
def detect_filename(filename, filetype, info=['filename'], advanced = False):
filename = u(filename)
print('For:', filename)
print('GuessIt found:', guess_file_info(filename, filetype, info).nice_string())
print('GuessIt found:', guess_file_info(filename, filetype, info).nice_string(advanced))
def run_demo(episodes=True, movies=True):
def run_demo(episodes=True, movies=True, advanced=False):
# NOTE: tests should not be added here but rather in the tests/ folder
# this is just intended as a quick example
if episodes:
@@ -50,7 +53,7 @@ def run_demo(episodes=True, movies=True):
for f in testeps:
print('-'*80)
detect_filename(f, filetype='episode')
detect_filename(f, filetype='episode', advanced=advanced)
if movies:
@@ -77,12 +80,17 @@ def run_demo(episodes=True, movies=True):
for f in testmovies:
print('-'*80)
detect_filename(f, filetype = 'movie')
detect_filename(f, filetype = 'movie', advanced = advanced)
def main():
slogging.setupLogging()
# see http://bugs.python.org/issue2128
if sys.version_info.major < 3 and os.name == 'nt':
for i, a in enumerate(sys.argv):
sys.argv[i] = a.decode(locale.getpreferredencoding())
parser = OptionParser(usage = 'usage: %prog [options] file1 [file2...]')
parser.add_option('-v', '--verbose', action='store_true', dest='verbose', default=False,
help = 'display debug output')
@@ -92,6 +100,8 @@ def main():
'them, comma-separated')
parser.add_option('-t', '--type', dest = 'filetype', default = 'autodetect',
help = 'the suggested file type: movie, episode or autodetect')
parser.add_option('-a', '--advanced', dest = 'advanced', action='store_true', default = False,
help = 'display advanced information for filename guesses, as json output')
parser.add_option('-d', '--demo', action='store_true', dest='demo', default=False,
help = 'run a few builtin tests instead of analyzing a file')
@@ -100,13 +110,14 @@ def main():
logging.getLogger('guessit').setLevel(logging.DEBUG)
if options.demo:
run_demo(episodes=True, movies=True)
run_demo(episodes=True, movies=True, advanced=options.advanced)
else:
if args:
for filename in args:
detect_filename(filename,
filetype = options.filetype,
info = options.info.split(','))
info = options.info.split(','),
advanced = options.advanced)
else:
parser.print_help()

View File

@@ -44,13 +44,14 @@ def split_path(path):
result = []
while True:
head, tail = os.path.split(path)
headlen = len(head)
# on Unix systems, the root folder is '/'
if head == '/' and tail == '':
if head and head == '/'*headlen and tail == '':
return ['/'] + result
# on Windows, the root folder is a drive letter (eg: 'C:\') or for shares \\
if ((len(head) == 3 and head[1:] == ':\\') or (len(head) == 2 and head == '\\\\')) and tail == '':
if ((headlen == 3 and head[1:] == ':\\') or (headlen == 2 and head == '\\\\')) and tail == '':
return [head] + result
if head == '' and tail == '':
@@ -61,6 +62,7 @@ def split_path(path):
path = head
continue
# otherwise, add the last path fragment and keep splitting
result = [tail] + result
path = head

View File

@@ -41,15 +41,21 @@ class Guess(UnicodeMixin, dict):
confidence = kwargs.pop('confidence')
except KeyError:
confidence = 0
try:
raw = kwargs.pop('raw')
except KeyError:
raw = None
dict.__init__(self, *args, **kwargs)
self._confidence = {}
self._raw = {}
for prop in self:
self._confidence[prop] = confidence
def to_dict(self):
self._raw[prop] = raw
def to_dict(self, advanced=False):
data = dict(self)
for prop, value in data.items():
if isinstance(value, datetime.date):
@@ -58,46 +64,65 @@ class Guess(UnicodeMixin, dict):
data[prop] = u(value)
elif isinstance(value, list):
data[prop] = [u(x) for x in value]
if advanced:
data[prop] = {"value": data[prop], "raw": self.raw(prop), "confidence": self.confidence(prop)}
return data
def nice_string(self):
data = self.to_dict()
parts = json.dumps(data, indent=4).split('\n')
for i, p in enumerate(parts):
if p[:5] != ' "':
continue
prop = p.split('"')[1]
parts[i] = (' [%.2f] "' % self.confidence(prop)) + p[5:]
return '\n'.join(parts)
def nice_string(self, advanced=False):
if advanced:
data = self.to_dict(advanced)
return json.dumps(data, indent=4)
else:
data = self.to_dict()
parts = json.dumps(data, indent=4).split('\n')
for i, p in enumerate(parts):
if p[:5] != ' "':
continue
prop = p.split('"')[1]
parts[i] = (' [%.2f] "' % self.confidence(prop)) + p[5:]
return '\n'.join(parts)
def __unicode__(self):
return u(self.to_dict())
def confidence(self, prop):
return self._confidence.get(prop, -1)
def raw(self, prop):
return self._raw.get(prop, None)
def set(self, prop, value, confidence=None):
def set(self, prop, value, confidence=None, raw=None):
self[prop] = value
if confidence is not None:
self._confidence[prop] = confidence
if raw is not None:
self._raw[prop] = raw
def set_confidence(self, prop, value):
self._confidence[prop] = value
def set_raw(self, prop, value):
self._raw[prop] = value
def update(self, other, confidence=None):
def update(self, other, confidence=None, raw=None):
dict.update(self, other)
if isinstance(other, Guess):
for prop in other:
self._confidence[prop] = other.confidence(prop)
self._raw[prop] = other.raw(prop)
if confidence is not None:
for prop in other:
self._confidence[prop] = confidence
if raw is not None:
for prop in other:
self._raw[prop] = raw
def update_highest_confidence(self, other):
"""Update this guess with the values from the given one. In case
there is property present in both, only the one with the highest one
@@ -110,6 +135,7 @@ class Guess(UnicodeMixin, dict):
continue
self[prop] = other[prop]
self._confidence[prop] = other.confidence(prop)
self._raw[prop] = other.raw(prop)
def choose_int(g1, g2):
@@ -181,7 +207,7 @@ def choose_string(g1, g2):
elif v1l in v2l:
return (v1, combined_prob)
# in case of conflict, return the one with highest priority
# in case of conflict, return the one with highest confidence
else:
if c1 > c2:
return (v1, c1 - c2)
@@ -288,7 +314,8 @@ def merge_all(guesses, append=None):
result.set(prop, result.get(prop, []) + [g[prop]],
# TODO: what to do with confidence here? maybe an
# arithmetic mean...
confidence=g.confidence(prop))
confidence=g.confidence(prop),
raw=g.raw(prop))
del g[prop]

View File

@@ -296,7 +296,7 @@ UNDETERMINED = Language('und')
ALL_LANGUAGES = frozenset(Language(lng) for lng in lng_all_names) - frozenset([UNDETERMINED])
ALL_LANGUAGES_NAMES = lng_all_names
def search_language(string, lang_filter=None):
def search_language(string, lang_filter=None, skip=None):
"""Looks for language patterns, and if found return the language object,
its group span and an associated confidence.
@@ -345,6 +345,16 @@ def search_language(string, lang_filter=None):
if pos != -1:
end = pos + len(lang)
# skip if span in in skip list
while skip and (pos - 1, end - 1) in skip:
pos = slow.find(lang, end)
if pos == -1:
continue
end = pos + len(lang)
if pos == -1:
continue
# make sure our word is always surrounded by separators
if slow[pos - 1] not in sep or slow[end] not in sep:
continue

View File

@@ -21,14 +21,14 @@
from __future__ import unicode_literals
from guessit import PY3, u, base_text_type
from guessit.matchtree import MatchTree
from guessit.textutils import normalize_unicode
from guessit.textutils import normalize_unicode, clean_string
import logging
log = logging.getLogger(__name__)
class IterativeMatcher(object):
def __init__(self, filename, filetype='autodetect', opts=None):
def __init__(self, filename, filetype='autodetect', opts=None, transfo_opts=None):
"""An iterative matcher tries to match different patterns that appear
in the filename.
@@ -38,7 +38,8 @@ class IterativeMatcher(object):
a movie.
The recognized 'filetype' values are:
[ autodetect, subtitle, movie, moviesubtitle, episode, episodesubtitle ]
[ autodetect, subtitle, info, movie, moviesubtitle, movieinfo, episode,
episodesubtitle, episodeinfo ]
The IterativeMatcher works mainly in 2 steps:
@@ -61,15 +62,20 @@ class IterativeMatcher(object):
it corresponds to a video codec, denoted by the letter'v' in the 4th line.
(for more info, see guess.matchtree.to_string)
Second, it tries to merge all this information into a single object
containing all the found properties, and does some (basic) conflict
resolution when they arise.
Second, it tries to merge all this information into a single object
containing all the found properties, and does some (basic) conflict
resolution when they arise.
When you create the Matcher, you can pass it:
- a list 'opts' of option names, that act as global flags
- a dict 'transfo_opts' of { transfo_name: (transfo_args, transfo_kwargs) }
with which to call the transfo.process() function.
"""
valid_filetypes = ('autodetect', 'subtitle', 'video',
'movie', 'moviesubtitle',
'episode', 'episodesubtitle')
valid_filetypes = ('autodetect', 'subtitle', 'info', 'video',
'movie', 'moviesubtitle', 'movieinfo',
'episode', 'episodesubtitle', 'episodeinfo')
if filetype not in valid_filetypes:
raise ValueError("filetype needs to be one of %s" % valid_filetypes)
if not PY3 and not isinstance(filename, unicode):
@@ -80,10 +86,22 @@ class IterativeMatcher(object):
if opts is None:
opts = []
elif isinstance(opts, base_text_type):
opts = opts.split()
if not isinstance(opts, list):
raise ValueError('opts must be a list of option names! Received: type=%s val=%s',
type(opts), opts)
if transfo_opts is None:
transfo_opts = {}
if not isinstance(transfo_opts, dict):
raise ValueError('transfo_opts must be a dict of { transfo_name: (args, kwargs) }. '+
'Received: type=%s val=%s', type(transfo_opts), transfo_opts)
self.match_tree = MatchTree(filename)
# sanity check: make sure we don't process a (mostly) empty string
if clean_string(filename) == '':
return
mtree = self.match_tree
mtree.guess.set('type', filetype, confidence=1.0)
@@ -91,7 +109,11 @@ class IterativeMatcher(object):
transfo = __import__('guessit.transfo.' + transfo_name,
globals=globals(), locals=locals(),
fromlist=['process'], level=0)
transfo.process(mtree, *args, **kwargs)
default_args, default_kwargs = transfo_opts.get(transfo_name, ((), {}))
all_args = args or default_args
all_kwargs = dict(default_kwargs)
all_kwargs.update(kwargs) # keep all kwargs merged together
transfo.process(mtree, *all_args, **all_kwargs)
# 1- first split our path into dirs + basename + ext
apply_transfo('split_path_components')
@@ -111,7 +133,7 @@ class IterativeMatcher(object):
# - language before episodes_rexps
# - properties before language (eg: he-aac vs hebrew)
# - release_group before properties (eg: XviD-?? vs xvid)
if mtree.guess['type'] in ('episode', 'episodesubtitle'):
if mtree.guess['type'] in ('episode', 'episodesubtitle', 'episodeinfo'):
strategy = [ 'guess_date', 'guess_website', 'guess_release_group',
'guess_properties', 'guess_language',
'guess_video_rexps',
@@ -124,6 +146,7 @@ class IterativeMatcher(object):
if 'nolanguage' in opts:
strategy.remove('guess_language')
for name in strategy:
apply_transfo(name)
@@ -143,7 +166,7 @@ class IterativeMatcher(object):
# 5- try to identify the remaining unknown groups by looking at their
# position relative to other known elements
if mtree.guess['type'] in ('episode', 'episodesubtitle'):
if mtree.guess['type'] in ('episode', 'episodesubtitle', 'episodeinfo'):
apply_transfo('guess_episode_info_from_position')
else:
apply_transfo('guess_movie_title_from_position')

View File

@@ -25,6 +25,8 @@ import re
subtitle_exts = [ 'srt', 'idx', 'sub', 'ssa' ]
info_exts = [ 'nfo' ]
video_exts = ['3g2', '3gp', '3gp2', 'asf', 'avi', 'divx', 'flv', 'm4v', 'mk2',
'mka', 'mkv', 'mov', 'mp4', 'mp4a', 'mpeg', 'mpg', 'ogg', 'ogm',
'ogv', 'qt', 'ra', 'ram', 'rm', 'ts', 'wav', 'webm', 'wma', 'wmv']
@@ -32,7 +34,7 @@ video_exts = ['3g2', '3gp', '3gp2', 'asf', 'avi', 'divx', 'flv', 'm4v', 'mk2',
group_delimiters = [ '()', '[]', '{}' ]
# separator character regexp
sep = r'[][)(}{+ /\._-]' # regexp art, hehe :D
sep = r'[][,)(}{+ /\._-]' # regexp art, hehe :D
# character used to represent a deleted char (when matching groups)
deleted = '_'
@@ -49,7 +51,7 @@ episode_rexps = [ # ... Season 2 ...
#(r'[Ss](?P<season>[0-9]{1,3})[^0-9]?(?P<bonusNumber>(?:-?[xX-][0-9]{1,3})+)[^0-9]', 1.0, (0, -1)),
# ... 2x13 ...
(r'[^0-9](?P<season>[0-9]{1,2})[^0-9]?(?P<episodeNumber>(?:-?[xX][0-9]{1,3})+)[^0-9]', 1.0, (1, -1)),
(r'[^0-9](?P<season>[0-9]{1,2})[^0-9 .-]?(?P<episodeNumber>(?:-?[xX][0-9]{1,3})+)[^0-9]', 1.0, (1, -1)),
# ... s02 ...
#(sep + r's(?P<season>[0-9]{1,2})' + sep, 0.6, (1, -1)),
@@ -122,9 +124,12 @@ prop_multi = { 'format': { 'DVD': [ 'DVD', 'DVD-Rip', 'VIDEO-TS', 'DVDivX' ],
'VHS': [ 'VHS' ],
'WEB-DL': [ 'WEB-DL' ] },
'is3D': { True: [ '3D' ] },
'screenSize': { '480p': [ '480[pi]?' ],
'720p': [ '720[pi]?' ],
'1080p': [ '1080[pi]?' ] },
'1080i': [ '1080i' ],
'1080p': [ '1080p', '1080[^i]' ] },
'videoCodec': { 'XviD': [ 'Xvid' ],
'DivX': [ 'DVDivX', 'DivX' ],
@@ -140,7 +145,7 @@ prop_multi = { 'format': { 'DVD': [ 'DVD', 'DVD-Rip', 'VIDEO-TS', 'DVDivX' ],
'DTS': [ 'DTS' ],
'AAC': [ 'He-AAC', 'AAC-He', 'AAC' ] },
'audioChannels': { '5.1': [ r'5\.1', 'DD5[\._ ]1', '5ch' ] },
'audioChannels': { '5.1': [ r'5\.1', 'DD5[._ ]1', '5ch' ] },
'episodeFormat': { 'Minisode': [ 'Minisodes?' ] }
@@ -170,7 +175,7 @@ prop_single = { 'releaseGroup': [ 'ESiR', 'WAF', 'SEPTiC', r'\[XCT\]', 'iNT', 'P
}
_dash = '-'
_psep = '[-\. _]?'
_psep = '[-. _]?'
def _to_rexp(prop):
return re.compile(prop.replace(_dash, _psep), re.IGNORECASE)
@@ -237,8 +242,9 @@ def canonical_form(string):
def compute_canonical_form(property_name, value):
"""Return the canonical form of a property given its type if it is a valid
one, None otherwise."""
for canonical_form, rexps in properties_rexps[property_name].items():
for rexp in rexps:
if rexp.match(value):
return canonical_form
if isinstance(value, basestring):
for canonical_form, rexps in properties_rexps[property_name].items():
for rexp in rexps:
if rexp.match(value):
return canonical_form
return None

View File

@@ -31,14 +31,15 @@ RED_FONT = "\x1B[0;31m"
RESET_FONT = "\x1B[0m"
def setupLogging(colored=True, with_time=False, with_thread=False, filename=None):
def setupLogging(colored=True, with_time=False, with_thread=False, filename=None, with_lineno=False):
"""Set up a nice colored logger as the main application logger."""
class SimpleFormatter(logging.Formatter):
def __init__(self, with_time, with_thread):
self.fmt = (('%(asctime)s ' if with_time else '') +
'%(levelname)-8s ' +
'[%(name)s:%(funcName)s]' +
'[%(name)s:%(funcName)s' +
(':%(lineno)s' if with_lineno else '') + ']' +
('[%(threadName)s]' if with_thread else '') +
' -- %(message)s')
logging.Formatter.__init__(self, self.fmt)
@@ -47,7 +48,8 @@ def setupLogging(colored=True, with_time=False, with_thread=False, filename=None
def __init__(self, with_time, with_thread):
self.fmt = (('%(asctime)s ' if with_time else '') +
'-CC-%(levelname)-8s ' +
BLUE_FONT + '[%(name)s:%(funcName)s]' +
BLUE_FONT + '[%(name)s:%(funcName)s' +
(':%(lineno)s' if with_lineno else '') + ']' +
RESET_FONT + ('[%(threadName)s]' if with_thread else '') +
' -- %(message)s')

View File

@@ -43,10 +43,13 @@ def strip_brackets(s):
return s
def clean_string(s):
for c in sep[:-2]: # do not remove dashes ('-')
s = s.replace(c, ' ')
parts = s.split()
def clean_string(st):
for c in sep:
# do not remove certain chars
if c in ['-', ',']:
continue
st = st.replace(c, ' ')
parts = st.split()
result = ' '.join(p for p in parts if p != '')
# now also remove dashes on the outer part of the string

View File

@@ -28,7 +28,7 @@ log = logging.getLogger(__name__)
def found_property(node, name, confidence):
node.guess = Guess({name: node.clean_value}, confidence=confidence)
node.guess = Guess({name: node.clean_value}, confidence=confidence, raw=node.value)
log.debug('Found with confidence %.2f: %s' % (confidence, node.guess))
@@ -52,11 +52,17 @@ def format_guess(guess):
def find_and_split_node(node, strategy, logger):
string = ' %s ' % node.value # add sentinels
for matcher, confidence in strategy:
for matcher, confidence, args, kwargs in strategy:
all_args = [string]
if getattr(matcher, 'use_node', False):
result, span = matcher(string, node)
all_args.append(node)
if args:
all_args.append(args)
if kwargs:
result, span = matcher(*all_args, **kwargs)
else:
result, span = matcher(string)
result, span = matcher(*all_args)
if result:
# readjust span to compensate for sentinels
@@ -69,7 +75,7 @@ def find_and_split_node(node, strategy, logger):
if confidence is None:
confidence = 1.0
guess = format_guess(Guess(result, confidence=confidence))
guess = format_guess(Guess(result, confidence=confidence, raw=string[span[0] + 1:span[1] + 1]))
msg = 'Found with confidence %.2f: %s' % (confidence, guess)
(logger or log).debug(msg)
@@ -84,10 +90,12 @@ def find_and_split_node(node, strategy, logger):
class SingleNodeGuesser(object):
def __init__(self, guess_func, confidence, logger=None):
def __init__(self, guess_func, confidence, logger, *args, **kwargs):
self.guess_func = guess_func
self.confidence = confidence
self.logger = logger
self.args = args
self.kwargs = kwargs
def process(self, mtree):
# strategy is a list of pairs (guesser, confidence)
@@ -95,7 +103,7 @@ class SingleNodeGuesser(object):
# it will override it, otherwise it will leave the guess confidence
# - if the guesser returns a simple dict as a guess and confidence is
# specified, it will use it, or 1.0 otherwise
strategy = [ (self.guess_func, self.confidence) ]
strategy = [ (self.guess_func, self.confidence, self.args, self.kwargs) ]
for node in mtree.unidentified_leaves():
find_and_split_node(node, strategy, self.logger)

View File

@@ -45,4 +45,4 @@ def process(mtree):
except ValueError:
continue
node.guess = Guess(country=country, confidence=1.0)
node.guess = Guess(country=country, confidence=1.0, raw=c)

View File

@@ -40,27 +40,22 @@ def guess_episodes_rexps(string):
for rexp, confidence, span_adjust in episode_rexps:
match = re.search(rexp, string, re.IGNORECASE)
if match:
guess = Guess(match.groupdict(), confidence=confidence)
span = (match.start() + span_adjust[0],
span = (match.start() + span_adjust[0],
match.end() + span_adjust[1])
# episodes which have a season > 30 are most likely errors
# (Simpsons is at 24!)
if int(guess.get('season', 0)) > 30:
continue
guess = Guess(match.groupdict(), confidence=confidence, raw=string[span[0]:span[1]])
# decide whether we have only a single episode number or an
# episode list
if guess.get('episodeNumber'):
eplist = number_list(guess['episodeNumber'])
guess.set('episodeNumber', eplist[0], confidence=confidence)
guess.set('episodeNumber', eplist[0], confidence=confidence, raw=string[span[0]:span[1]])
if len(eplist) > 1:
guess.set('episodeList', eplist, confidence=confidence)
guess.set('episodeList', eplist, confidence=confidence, raw=string[span[0]:span[1]])
if guess.get('bonusNumber'):
eplist = number_list(guess['bonusNumber'])
guess.set('bonusNumber', eplist[0], confidence=confidence)
guess.set('bonusNumber', eplist[0], confidence=confidence, raw=string[span[0]:span[1]])
return guess, span

View File

@@ -20,7 +20,7 @@
from __future__ import unicode_literals
from guessit import Guess
from guessit.patterns import (subtitle_exts, video_exts, episode_rexps,
from guessit.patterns import (subtitle_exts, info_exts, video_exts, episode_rexps,
find_properties, compute_canonical_form)
from guessit.date import valid_year
from guessit.textutils import clean_string
@@ -53,12 +53,16 @@ def guess_filetype(mtree, filetype):
filetype_container[0] = 'episode'
elif filetype_container[0] == 'subtitle':
filetype_container[0] = 'episodesubtitle'
elif filetype_container[0] == 'info':
filetype_container[0] = 'episodeinfo'
def upgrade_movie():
if filetype_container[0] == 'video':
filetype_container[0] = 'movie'
elif filetype_container[0] == 'subtitle':
filetype_container[0] = 'moviesubtitle'
elif filetype_container[0] == 'info':
filetype_container[0] = 'movieinfo'
def upgrade_subtitle():
if 'movie' in filetype_container[0]:
@@ -68,6 +72,14 @@ def guess_filetype(mtree, filetype):
else:
filetype_container[0] = 'subtitle'
def upgrade_info():
if 'movie' in filetype_container[0]:
filetype_container[0] = 'movieinfo'
elif 'episode' in filetype_container[0]:
filetype_container[0] = 'episodeinfo'
else:
filetype_container[0] = 'info'
def upgrade(type='unknown'):
if filetype_container[0] == 'autodetect':
filetype_container[0] = type
@@ -78,6 +90,9 @@ def guess_filetype(mtree, filetype):
if fileext in subtitle_exts:
upgrade_subtitle()
other = { 'container': fileext }
elif fileext in info_exts:
upgrade_info()
other = { 'container': fileext }
elif fileext in video_exts:
upgrade(type='video')
other = { 'container': fileext }
@@ -104,17 +119,20 @@ def guess_filetype(mtree, filetype):
fname = clean_string(filename).lower()
for m in MOVIES:
if m in fname:
log.debug('Found in exception list of movies -> type = movie')
upgrade_movie()
for s in SERIES:
if s in fname:
log.debug('Found in exception list of series -> type = episode')
upgrade_episode()
# now look whether there are some specific hints for episode vs movie
if filetype_container[0] in ('video', 'subtitle'):
if filetype_container[0] in ('video', 'subtitle', 'info'):
# if we have an episode_rexp (eg: s02e13), it is an episode
for rexp, _, _ in episode_rexps:
match = re.search(rexp, filename, re.IGNORECASE)
if match:
log.debug('Found matching regexp: "%s" (string = "%s") -> type = episode', rexp, match.group())
upgrade_episode()
break
@@ -133,24 +151,29 @@ def guess_filetype(mtree, filetype):
possible = False
if possible:
log.debug('Found possible episode number: %s (from string "%s") -> type = episode', epnumber, match.group())
upgrade_episode()
# if we have certain properties characteristic of episodes, it is an ep
for prop, value, _, _ in find_properties(filename):
log.debug('prop: %s = %s' % (prop, value))
if prop == 'episodeFormat':
log.debug('Found characteristic property of episodes: %s = "%s"', prop, value)
upgrade_episode()
break
elif compute_canonical_form('format', value) == 'DVB':
log.debug('Found characteristic property of episodes: %s = "%s"', prop, value)
upgrade_episode()
break
# origin-specific type
if 'tvu.org.ru' in filename:
log.debug('Found characteristic property of episodes: %s = "%s"', prop, value)
upgrade_episode()
# if no episode info found, assume it's a movie
log.debug('Nothing characteristic found, assuming type = movie')
upgrade_movie()
filetype = filetype_container[0]

View File

@@ -22,22 +22,34 @@ from __future__ import unicode_literals
from guessit import Guess
from guessit.transfo import SingleNodeGuesser
from guessit.language import search_language
from guessit.textutils import clean_string, find_words
import logging
log = logging.getLogger(__name__)
def guess_language(string):
language, span, confidence = search_language(string)
def guess_language(string, node, skip=None):
if skip:
relative_skip = []
for entry in skip:
node_idx = entry['node_idx']
span = entry['span']
if node_idx == node.node_idx[:len(node_idx)]:
relative_span = (span[0] - node.offset + 1, span[1] - node.offset + 1)
relative_skip.append(relative_span)
skip = relative_skip
language, span, confidence = search_language(string, skip=skip)
if language:
return (Guess({'language': language},
confidence=confidence),
confidence=confidence,
raw= string[span[0]:span[1]]),
span)
return None, None
guess_language.use_node = True
def process(mtree):
SingleNodeGuesser(guess_language, None, log).process(mtree)
def process(mtree, *args, **kwargs):
SingleNodeGuesser(guess_language, None, log, *args, **kwargs).process(mtree)
# Note: 'language' is promoted to 'subtitleLanguage' in the post_process transfo

View File

@@ -29,7 +29,8 @@ log = logging.getLogger(__name__)
def process(mtree):
def found_property(node, name, value, confidence):
node.guess = Guess({ name: value },
confidence=confidence)
confidence=confidence,
raw=value)
log.debug('Found with confidence %.2f: %s' % (confidence, node.guess))
def found_title(node, confidence):

View File

@@ -38,9 +38,10 @@ def guess_video_rexps(string):
# the soonest that we can catch it)
if metadata.get('cdNumberTotal', -1) is None:
del metadata['cdNumberTotal']
return (Guess(metadata, confidence=confidence),
(match.start() + span_adjust[0],
match.end() + span_adjust[1] - 2))
span = (match.start() + span_adjust[0],
match.end() + span_adjust[1] - 2)
return (Guess(metadata, confidence=confidence, raw=string[span[0]:span[1]]),
span)
return None, None

View File

@@ -48,9 +48,9 @@ def guess_weak_episodes_rexps(string, node):
continue
return Guess({ 'season': season,
'episodeNumber': epnum },
confidence=0.6), span
confidence=0.6, raw=string[span[0]:span[1]]), span
else:
return Guess(metadata, confidence=0.3), span
return Guess(metadata, confidence=0.3, raw=string[span[0]:span[1]]), span
return None, None

View File

@@ -1,4 +1,4 @@
"""
"""
HTML parsing library based on the WHATWG "HTML5"
specification. The parser is designed to be compatible with existing
HTML found in the wild and implements well-defined error recovery that
@@ -8,10 +8,16 @@ Example usage:
import html5lib
f = open("my_document.html")
tree = html5lib.parse(f)
tree = html5lib.parse(f)
"""
__version__ = "0.95-dev"
from html5parser import HTMLParser, parse, parseFragment
from treebuilders import getTreeBuilder
from treewalkers import getTreeWalker
from serializer import serialize
from __future__ import absolute_import, division, unicode_literals
from .html5parser import HTMLParser, parse, parseFragment
from .treebuilders import getTreeBuilder
from .treewalkers import getTreeWalker
from .serializer import serialize
__all__ = ["HTMLParser", "parse", "parseFragment", "getTreeBuilder",
"getTreeWalker", "serialize"]
__version__ = "0.99"

File diff suppressed because it is too large Load Diff

View File

@@ -1,3 +1,5 @@
from __future__ import absolute_import, division, unicode_literals
class Filter(object):
def __init__(self, source):

View File

@@ -0,0 +1,20 @@
from __future__ import absolute_import, division, unicode_literals
from . import _base
try:
from collections import OrderedDict
except ImportError:
from ordereddict import OrderedDict
class Filter(_base.Filter):
def __iter__(self):
for token in _base.Filter.__iter__(self):
if token["type"] in ("StartTag", "EmptyTag"):
attrs = OrderedDict()
for name, value in sorted(token["data"].items(),
key=lambda x: x[0]):
attrs[name] = value
token["data"] = attrs
yield token

View File

@@ -1,127 +0,0 @@
#
# The goal is to finally have a form filler where you pass data for
# each form, using the algorithm for "Seeding a form with initial values"
# See http://www.whatwg.org/specs/web-forms/current-work/#seeding
#
import _base
from html5lib.constants import spaceCharacters
spaceCharacters = u"".join(spaceCharacters)
class SimpleFilter(_base.Filter):
def __init__(self, source, fieldStorage):
_base.Filter.__init__(self, source)
self.fieldStorage = fieldStorage
def __iter__(self):
field_indices = {}
state = None
field_name = None
for token in _base.Filter.__iter__(self):
type = token["type"]
if type in ("StartTag", "EmptyTag"):
name = token["name"].lower()
if name == "input":
field_name = None
field_type = None
input_value_index = -1
input_checked_index = -1
for i,(n,v) in enumerate(token["data"]):
n = n.lower()
if n == u"name":
field_name = v.strip(spaceCharacters)
elif n == u"type":
field_type = v.strip(spaceCharacters)
elif n == u"checked":
input_checked_index = i
elif n == u"value":
input_value_index = i
value_list = self.fieldStorage.getlist(field_name)
field_index = field_indices.setdefault(field_name, 0)
if field_index < len(value_list):
value = value_list[field_index]
else:
value = ""
if field_type in (u"checkbox", u"radio"):
if value_list:
if token["data"][input_value_index][1] == value:
if input_checked_index < 0:
token["data"].append((u"checked", u""))
field_indices[field_name] = field_index + 1
elif input_checked_index >= 0:
del token["data"][input_checked_index]
elif field_type not in (u"button", u"submit", u"reset"):
if input_value_index >= 0:
token["data"][input_value_index] = (u"value", value)
else:
token["data"].append((u"value", value))
field_indices[field_name] = field_index + 1
field_type = None
field_name = None
elif name == "textarea":
field_type = "textarea"
field_name = dict((token["data"])[::-1])["name"]
elif name == "select":
field_type = "select"
attributes = dict(token["data"][::-1])
field_name = attributes.get("name")
is_select_multiple = "multiple" in attributes
is_selected_option_found = False
elif field_type == "select" and field_name and name == "option":
option_selected_index = -1
option_value = None
for i,(n,v) in enumerate(token["data"]):
n = n.lower()
if n == "selected":
option_selected_index = i
elif n == "value":
option_value = v.strip(spaceCharacters)
if option_value is None:
raise NotImplementedError("<option>s without a value= attribute")
else:
value_list = self.fieldStorage.getlist(field_name)
if value_list:
field_index = field_indices.setdefault(field_name, 0)
if field_index < len(value_list):
value = value_list[field_index]
else:
value = ""
if (is_select_multiple or not is_selected_option_found) and option_value == value:
if option_selected_index < 0:
token["data"].append((u"selected", u""))
field_indices[field_name] = field_index + 1
is_selected_option_found = True
elif option_selected_index >= 0:
del token["data"][option_selected_index]
elif field_type is not None and field_name and type == "EndTag":
name = token["name"].lower()
if name == field_type:
if name == "textarea":
value_list = self.fieldStorage.getlist(field_name)
if value_list:
field_index = field_indices.setdefault(field_name, 0)
if field_index < len(value_list):
value = value_list[field_index]
else:
value = ""
yield {"type": "Characters", "data": value}
field_indices[field_name] = field_index + 1
field_name = None
elif name == "option" and field_type == "select":
pass # TODO: part of "option without value= attribute" processing
elif field_type == "textarea":
continue # ignore token
yield token

View File

@@ -1,4 +1,7 @@
import _base
from __future__ import absolute_import, division, unicode_literals
from . import _base
class Filter(_base.Filter):
def __init__(self, source, encoding):
@@ -13,44 +16,44 @@ class Filter(_base.Filter):
for token in _base.Filter.__iter__(self):
type = token["type"]
if type == "StartTag":
if token["name"].lower() == u"head":
if token["name"].lower() == "head":
state = "in_head"
elif type == "EmptyTag":
if token["name"].lower() == u"meta":
# replace charset with actual encoding
has_http_equiv_content_type = False
for (namespace,name),value in token["data"].iteritems():
if namespace != None:
continue
elif name.lower() == u'charset':
token["data"][(namespace,name)] = self.encoding
meta_found = True
break
elif name == u'http-equiv' and value.lower() == u'content-type':
has_http_equiv_content_type = True
else:
if has_http_equiv_content_type and (None, u"content") in token["data"]:
token["data"][(None, u"content")] = u'text/html; charset=%s' % self.encoding
meta_found = True
if token["name"].lower() == "meta":
# replace charset with actual encoding
has_http_equiv_content_type = False
for (namespace, name), value in token["data"].items():
if namespace is not None:
continue
elif name.lower() == 'charset':
token["data"][(namespace, name)] = self.encoding
meta_found = True
break
elif name == 'http-equiv' and value.lower() == 'content-type':
has_http_equiv_content_type = True
else:
if has_http_equiv_content_type and (None, "content") in token["data"]:
token["data"][(None, "content")] = 'text/html; charset=%s' % self.encoding
meta_found = True
elif token["name"].lower() == u"head" and not meta_found:
elif token["name"].lower() == "head" and not meta_found:
# insert meta into empty head
yield {"type": "StartTag", "name": u"head",
yield {"type": "StartTag", "name": "head",
"data": token["data"]}
yield {"type": "EmptyTag", "name": u"meta",
"data": {(None, u"charset"): self.encoding}}
yield {"type": "EndTag", "name": u"head"}
yield {"type": "EmptyTag", "name": "meta",
"data": {(None, "charset"): self.encoding}}
yield {"type": "EndTag", "name": "head"}
meta_found = True
continue
elif type == "EndTag":
if token["name"].lower() == u"head" and pending:
if token["name"].lower() == "head" and pending:
# insert meta into head (if necessary) and flush pending queue
yield pending.pop(0)
if not meta_found:
yield {"type": "EmptyTag", "name": u"meta",
"data": {(None, u"charset"): self.encoding}}
yield {"type": "EmptyTag", "name": "meta",
"data": {(None, "charset"): self.encoding}}
while pending:
yield pending.pop(0)
meta_found = True

View File

@@ -1,13 +1,18 @@
from __future__ import absolute_import, division, unicode_literals
from gettext import gettext
_ = gettext
import _base
from html5lib.constants import cdataElements, rcdataElements, voidElements
from . import _base
from ..constants import cdataElements, rcdataElements, voidElements
from html5lib.constants import spaceCharacters
spaceCharacters = u"".join(spaceCharacters)
from ..constants import spaceCharacters
spaceCharacters = "".join(spaceCharacters)
class LintError(Exception):
pass
class LintError(Exception): pass
class Filter(_base.Filter):
def __iter__(self):
@@ -18,24 +23,24 @@ class Filter(_base.Filter):
if type in ("StartTag", "EmptyTag"):
name = token["name"]
if contentModelFlag != "PCDATA":
raise LintError(_("StartTag not in PCDATA content model flag: %s") % name)
if not isinstance(name, unicode):
raise LintError(_(u"Tag name is not a string: %r") % name)
raise LintError(_("StartTag not in PCDATA content model flag: %(tag)s") % {"tag": name})
if not isinstance(name, str):
raise LintError(_("Tag name is not a string: %(tag)r") % {"tag": name})
if not name:
raise LintError(_(u"Empty tag name"))
raise LintError(_("Empty tag name"))
if type == "StartTag" and name in voidElements:
raise LintError(_(u"Void element reported as StartTag token: %s") % name)
raise LintError(_("Void element reported as StartTag token: %(tag)s") % {"tag": name})
elif type == "EmptyTag" and name not in voidElements:
raise LintError(_(u"Non-void element reported as EmptyTag token: %s") % token["name"])
raise LintError(_("Non-void element reported as EmptyTag token: %(tag)s") % {"tag": token["name"]})
if type == "StartTag":
open_elements.append(name)
for name, value in token["data"]:
if not isinstance(name, unicode):
raise LintError(_("Attribute name is not a string: %r") % name)
if not isinstance(name, str):
raise LintError(_("Attribute name is not a string: %(name)r") % {"name": name})
if not name:
raise LintError(_(u"Empty attribute name"))
if not isinstance(value, unicode):
raise LintError(_("Attribute value is not a string: %r") % value)
raise LintError(_("Empty attribute name"))
if not isinstance(value, str):
raise LintError(_("Attribute value is not a string: %(value)r") % {"value": value})
if name in cdataElements:
contentModelFlag = "CDATA"
elif name in rcdataElements:
@@ -45,15 +50,15 @@ class Filter(_base.Filter):
elif type == "EndTag":
name = token["name"]
if not isinstance(name, unicode):
raise LintError(_(u"Tag name is not a string: %r") % name)
if not isinstance(name, str):
raise LintError(_("Tag name is not a string: %(tag)r") % {"tag": name})
if not name:
raise LintError(_(u"Empty tag name"))
raise LintError(_("Empty tag name"))
if name in voidElements:
raise LintError(_(u"Void element reported as EndTag token: %s") % name)
raise LintError(_("Void element reported as EndTag token: %(tag)s") % {"tag": name})
start_name = open_elements.pop()
if start_name != name:
raise LintError(_(u"EndTag (%s) does not match StartTag (%s)") % (name, start_name))
raise LintError(_("EndTag (%(end)s) does not match StartTag (%(start)s)") % {"end": name, "start": start_name})
contentModelFlag = "PCDATA"
elif type == "Comment":
@@ -62,27 +67,27 @@ class Filter(_base.Filter):
elif type in ("Characters", "SpaceCharacters"):
data = token["data"]
if not isinstance(data, unicode):
raise LintError(_("Attribute name is not a string: %r") % data)
if not isinstance(data, str):
raise LintError(_("Attribute name is not a string: %(name)r") % {"name": data})
if not data:
raise LintError(_(u"%s token with empty data") % type)
raise LintError(_("%(type)s token with empty data") % {"type": type})
if type == "SpaceCharacters":
data = data.strip(spaceCharacters)
if data:
raise LintError(_(u"Non-space character(s) found in SpaceCharacters token: ") % data)
raise LintError(_("Non-space character(s) found in SpaceCharacters token: %(token)r") % {"token": data})
elif type == "Doctype":
name = token["name"]
if contentModelFlag != "PCDATA":
raise LintError(_("Doctype not in PCDATA content model flag: %s") % name)
if not isinstance(name, unicode):
raise LintError(_(u"Tag name is not a string: %r") % name)
raise LintError(_("Doctype not in PCDATA content model flag: %(name)s") % {"name": name})
if not isinstance(name, str):
raise LintError(_("Tag name is not a string: %(tag)r") % {"tag": name})
# XXX: what to do with token["data"] ?
elif type in ("ParseError", "SerializeError"):
pass
else:
raise LintError(_(u"Unknown token type: %s") % type)
raise LintError(_("Unknown token type: %(type)s") % {"type": type})
yield token

View File

@@ -1,4 +1,7 @@
import _base
from __future__ import absolute_import, division, unicode_literals
from . import _base
class Filter(_base.Filter):
def slider(self):
@@ -14,8 +17,8 @@ class Filter(_base.Filter):
for previous, token, next in self.slider():
type = token["type"]
if type == "StartTag":
if (token["data"] or
not self.is_optional_start(token["name"], previous, next)):
if (token["data"] or
not self.is_optional_start(token["name"], previous, next)):
yield token
elif type == "EndTag":
if not self.is_optional_end(token["name"], next):
@@ -73,7 +76,7 @@ class Filter(_base.Filter):
# omit the thead and tfoot elements' end tag when they are
# immediately followed by a tbody element. See is_optional_end.
if previous and previous['type'] == 'EndTag' and \
previous['name'] in ('tbody','thead','tfoot'):
previous['name'] in ('tbody', 'thead', 'tfoot'):
return False
return next["name"] == 'tr'
else:
@@ -121,10 +124,10 @@ class Filter(_base.Filter):
# there is no more content in the parent element.
if type in ("StartTag", "EmptyTag"):
return next["name"] in ('address', 'article', 'aside',
'blockquote', 'datagrid', 'dialog',
'blockquote', 'datagrid', 'dialog',
'dir', 'div', 'dl', 'fieldset', 'footer',
'form', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
'header', 'hr', 'menu', 'nav', 'ol',
'header', 'hr', 'menu', 'nav', 'ol',
'p', 'pre', 'section', 'table', 'ul')
else:
return type == "EndTag" or type is None

View File

@@ -1,8 +1,12 @@
import _base
from html5lib.sanitizer import HTMLSanitizerMixin
from __future__ import absolute_import, division, unicode_literals
from . import _base
from ..sanitizer import HTMLSanitizerMixin
class Filter(_base.Filter, HTMLSanitizerMixin):
def __iter__(self):
for token in _base.Filter.__iter__(self):
token = self.sanitize_token(token)
if token: yield token
if token:
yield token

View File

@@ -1,16 +1,13 @@
try:
frozenset
except NameError:
# Import from the sets module for python 2.3
from sets import ImmutableSet as frozenset
from __future__ import absolute_import, division, unicode_literals
import re
import _base
from html5lib.constants import rcdataElements, spaceCharacters
spaceCharacters = u"".join(spaceCharacters)
from . import _base
from ..constants import rcdataElements, spaceCharacters
spaceCharacters = "".join(spaceCharacters)
SPACES_REGEX = re.compile("[%s]+" % spaceCharacters)
SPACES_REGEX = re.compile(u"[%s]+" % spaceCharacters)
class Filter(_base.Filter):
@@ -21,7 +18,7 @@ class Filter(_base.Filter):
for token in _base.Filter.__iter__(self):
type = token["type"]
if type == "StartTag" \
and (preserve or token["name"] in self.spacePreserveElements):
and (preserve or token["name"] in self.spacePreserveElements):
preserve += 1
elif type == "EndTag" and preserve:
@@ -29,13 +26,13 @@ class Filter(_base.Filter):
elif not preserve and type == "SpaceCharacters" and token["data"]:
# Test on token["data"] above to not introduce spaces where there were not
token["data"] = u" "
token["data"] = " "
elif not preserve and type == "Characters":
token["data"] = collapse_spaces(token["data"])
yield token
def collapse_spaces(text):
return SPACES_REGEX.sub(' ', text)

File diff suppressed because it is too large Load Diff

View File

@@ -1,25 +1,105 @@
import re
from __future__ import absolute_import, division, unicode_literals
baseChar = """[#x0041-#x005A] | [#x0061-#x007A] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x00FF] | [#x0100-#x0131] | [#x0134-#x013E] | [#x0141-#x0148] | [#x014A-#x017E] | [#x0180-#x01C3] | [#x01CD-#x01F0] | [#x01F4-#x01F5] | [#x01FA-#x0217] | [#x0250-#x02A8] | [#x02BB-#x02C1] | #x0386 | [#x0388-#x038A] | #x038C | [#x038E-#x03A1] | [#x03A3-#x03CE] | [#x03D0-#x03D6] | #x03DA | #x03DC | #x03DE | #x03E0 | [#x03E2-#x03F3] | [#x0401-#x040C] | [#x040E-#x044F] | [#x0451-#x045C] | [#x045E-#x0481] | [#x0490-#x04C4] | [#x04C7-#x04C8] | [#x04CB-#x04CC] | [#x04D0-#x04EB] | [#x04EE-#x04F5] | [#x04F8-#x04F9] | [#x0531-#x0556] | #x0559 | [#x0561-#x0586] | [#x05D0-#x05EA] | [#x05F0-#x05F2] | [#x0621-#x063A] | [#x0641-#x064A] | [#x0671-#x06B7] | [#x06BA-#x06BE] | [#x06C0-#x06CE] | [#x06D0-#x06D3] | #x06D5 | [#x06E5-#x06E6] | [#x0905-#x0939] | #x093D | [#x0958-#x0961] | [#x0985-#x098C] | [#x098F-#x0990] | [#x0993-#x09A8] | [#x09AA-#x09B0] | #x09B2 | [#x09B6-#x09B9] | [#x09DC-#x09DD] | [#x09DF-#x09E1] | [#x09F0-#x09F1] | [#x0A05-#x0A0A] | [#x0A0F-#x0A10] | [#x0A13-#x0A28] | [#x0A2A-#x0A30] | [#x0A32-#x0A33] | [#x0A35-#x0A36] | [#x0A38-#x0A39] | [#x0A59-#x0A5C] | #x0A5E | [#x0A72-#x0A74] | [#x0A85-#x0A8B] | #x0A8D | [#x0A8F-#x0A91] | [#x0A93-#x0AA8] | [#x0AAA-#x0AB0] | [#x0AB2-#x0AB3] | [#x0AB5-#x0AB9] | #x0ABD | #x0AE0 | [#x0B05-#x0B0C] | [#x0B0F-#x0B10] | [#x0B13-#x0B28] | [#x0B2A-#x0B30] | [#x0B32-#x0B33] | [#x0B36-#x0B39] | #x0B3D | [#x0B5C-#x0B5D] | [#x0B5F-#x0B61] | [#x0B85-#x0B8A] | [#x0B8E-#x0B90] | [#x0B92-#x0B95] | [#x0B99-#x0B9A] | #x0B9C | [#x0B9E-#x0B9F] | [#x0BA3-#x0BA4] | [#x0BA8-#x0BAA] | [#x0BAE-#x0BB5] | [#x0BB7-#x0BB9] | [#x0C05-#x0C0C] | [#x0C0E-#x0C10] | [#x0C12-#x0C28] | [#x0C2A-#x0C33] | [#x0C35-#x0C39] | [#x0C60-#x0C61] | [#x0C85-#x0C8C] | [#x0C8E-#x0C90] | [#x0C92-#x0CA8] | [#x0CAA-#x0CB3] | [#x0CB5-#x0CB9] | #x0CDE | [#x0CE0-#x0CE1] | [#x0D05-#x0D0C] | [#x0D0E-#x0D10] | [#x0D12-#x0D28] | [#x0D2A-#x0D39] | [#x0D60-#x0D61] | [#x0E01-#x0E2E] | #x0E30 | [#x0E32-#x0E33] | [#x0E40-#x0E45] | [#x0E81-#x0E82] | #x0E84 | [#x0E87-#x0E88] | #x0E8A | #x0E8D | [#x0E94-#x0E97] | [#x0E99-#x0E9F] | [#x0EA1-#x0EA3] | #x0EA5 | #x0EA7 | [#x0EAA-#x0EAB] | [#x0EAD-#x0EAE] | #x0EB0 | [#x0EB2-#x0EB3] | #x0EBD | [#x0EC0-#x0EC4] | [#x0F40-#x0F47] | [#x0F49-#x0F69] | [#x10A0-#x10C5] | [#x10D0-#x10F6] | #x1100 | [#x1102-#x1103] | [#x1105-#x1107] | #x1109 | [#x110B-#x110C] | [#x110E-#x1112] | #x113C | #x113E | #x1140 | #x114C | #x114E | #x1150 | [#x1154-#x1155] | #x1159 | [#x115F-#x1161] | #x1163 | #x1165 | #x1167 | #x1169 | [#x116D-#x116E] | [#x1172-#x1173] | #x1175 | #x119E | #x11A8 | #x11AB | [#x11AE-#x11AF] | [#x11B7-#x11B8] | #x11BA | [#x11BC-#x11C2] | #x11EB | #x11F0 | #x11F9 | [#x1E00-#x1E9B] | [#x1EA0-#x1EF9] | [#x1F00-#x1F15] | [#x1F18-#x1F1D] | [#x1F20-#x1F45] | [#x1F48-#x1F4D] | [#x1F50-#x1F57] | #x1F59 | #x1F5B | #x1F5D | [#x1F5F-#x1F7D] | [#x1F80-#x1FB4] | [#x1FB6-#x1FBC] | #x1FBE | [#x1FC2-#x1FC4] | [#x1FC6-#x1FCC] | [#x1FD0-#x1FD3] | [#x1FD6-#x1FDB] | [#x1FE0-#x1FEC] | [#x1FF2-#x1FF4] | [#x1FF6-#x1FFC] | #x2126 | [#x212A-#x212B] | #x212E | [#x2180-#x2182] | [#x3041-#x3094] | [#x30A1-#x30FA] | [#x3105-#x312C] | [#xAC00-#xD7A3]"""
import re
import warnings
from .constants import DataLossWarning
baseChar = """
[#x0041-#x005A] | [#x0061-#x007A] | [#x00C0-#x00D6] | [#x00D8-#x00F6] |
[#x00F8-#x00FF] | [#x0100-#x0131] | [#x0134-#x013E] | [#x0141-#x0148] |
[#x014A-#x017E] | [#x0180-#x01C3] | [#x01CD-#x01F0] | [#x01F4-#x01F5] |
[#x01FA-#x0217] | [#x0250-#x02A8] | [#x02BB-#x02C1] | #x0386 |
[#x0388-#x038A] | #x038C | [#x038E-#x03A1] | [#x03A3-#x03CE] |
[#x03D0-#x03D6] | #x03DA | #x03DC | #x03DE | #x03E0 | [#x03E2-#x03F3] |
[#x0401-#x040C] | [#x040E-#x044F] | [#x0451-#x045C] | [#x045E-#x0481] |
[#x0490-#x04C4] | [#x04C7-#x04C8] | [#x04CB-#x04CC] | [#x04D0-#x04EB] |
[#x04EE-#x04F5] | [#x04F8-#x04F9] | [#x0531-#x0556] | #x0559 |
[#x0561-#x0586] | [#x05D0-#x05EA] | [#x05F0-#x05F2] | [#x0621-#x063A] |
[#x0641-#x064A] | [#x0671-#x06B7] | [#x06BA-#x06BE] | [#x06C0-#x06CE] |
[#x06D0-#x06D3] | #x06D5 | [#x06E5-#x06E6] | [#x0905-#x0939] | #x093D |
[#x0958-#x0961] | [#x0985-#x098C] | [#x098F-#x0990] | [#x0993-#x09A8] |
[#x09AA-#x09B0] | #x09B2 | [#x09B6-#x09B9] | [#x09DC-#x09DD] |
[#x09DF-#x09E1] | [#x09F0-#x09F1] | [#x0A05-#x0A0A] | [#x0A0F-#x0A10] |
[#x0A13-#x0A28] | [#x0A2A-#x0A30] | [#x0A32-#x0A33] | [#x0A35-#x0A36] |
[#x0A38-#x0A39] | [#x0A59-#x0A5C] | #x0A5E | [#x0A72-#x0A74] |
[#x0A85-#x0A8B] | #x0A8D | [#x0A8F-#x0A91] | [#x0A93-#x0AA8] |
[#x0AAA-#x0AB0] | [#x0AB2-#x0AB3] | [#x0AB5-#x0AB9] | #x0ABD | #x0AE0 |
[#x0B05-#x0B0C] | [#x0B0F-#x0B10] | [#x0B13-#x0B28] | [#x0B2A-#x0B30] |
[#x0B32-#x0B33] | [#x0B36-#x0B39] | #x0B3D | [#x0B5C-#x0B5D] |
[#x0B5F-#x0B61] | [#x0B85-#x0B8A] | [#x0B8E-#x0B90] | [#x0B92-#x0B95] |
[#x0B99-#x0B9A] | #x0B9C | [#x0B9E-#x0B9F] | [#x0BA3-#x0BA4] |
[#x0BA8-#x0BAA] | [#x0BAE-#x0BB5] | [#x0BB7-#x0BB9] | [#x0C05-#x0C0C] |
[#x0C0E-#x0C10] | [#x0C12-#x0C28] | [#x0C2A-#x0C33] | [#x0C35-#x0C39] |
[#x0C60-#x0C61] | [#x0C85-#x0C8C] | [#x0C8E-#x0C90] | [#x0C92-#x0CA8] |
[#x0CAA-#x0CB3] | [#x0CB5-#x0CB9] | #x0CDE | [#x0CE0-#x0CE1] |
[#x0D05-#x0D0C] | [#x0D0E-#x0D10] | [#x0D12-#x0D28] | [#x0D2A-#x0D39] |
[#x0D60-#x0D61] | [#x0E01-#x0E2E] | #x0E30 | [#x0E32-#x0E33] |
[#x0E40-#x0E45] | [#x0E81-#x0E82] | #x0E84 | [#x0E87-#x0E88] | #x0E8A |
#x0E8D | [#x0E94-#x0E97] | [#x0E99-#x0E9F] | [#x0EA1-#x0EA3] | #x0EA5 |
#x0EA7 | [#x0EAA-#x0EAB] | [#x0EAD-#x0EAE] | #x0EB0 | [#x0EB2-#x0EB3] |
#x0EBD | [#x0EC0-#x0EC4] | [#x0F40-#x0F47] | [#x0F49-#x0F69] |
[#x10A0-#x10C5] | [#x10D0-#x10F6] | #x1100 | [#x1102-#x1103] |
[#x1105-#x1107] | #x1109 | [#x110B-#x110C] | [#x110E-#x1112] | #x113C |
#x113E | #x1140 | #x114C | #x114E | #x1150 | [#x1154-#x1155] | #x1159 |
[#x115F-#x1161] | #x1163 | #x1165 | #x1167 | #x1169 | [#x116D-#x116E] |
[#x1172-#x1173] | #x1175 | #x119E | #x11A8 | #x11AB | [#x11AE-#x11AF] |
[#x11B7-#x11B8] | #x11BA | [#x11BC-#x11C2] | #x11EB | #x11F0 | #x11F9 |
[#x1E00-#x1E9B] | [#x1EA0-#x1EF9] | [#x1F00-#x1F15] | [#x1F18-#x1F1D] |
[#x1F20-#x1F45] | [#x1F48-#x1F4D] | [#x1F50-#x1F57] | #x1F59 | #x1F5B |
#x1F5D | [#x1F5F-#x1F7D] | [#x1F80-#x1FB4] | [#x1FB6-#x1FBC] | #x1FBE |
[#x1FC2-#x1FC4] | [#x1FC6-#x1FCC] | [#x1FD0-#x1FD3] | [#x1FD6-#x1FDB] |
[#x1FE0-#x1FEC] | [#x1FF2-#x1FF4] | [#x1FF6-#x1FFC] | #x2126 |
[#x212A-#x212B] | #x212E | [#x2180-#x2182] | [#x3041-#x3094] |
[#x30A1-#x30FA] | [#x3105-#x312C] | [#xAC00-#xD7A3]"""
ideographic = """[#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029]"""
combiningCharacter = """[#x0300-#x0345] | [#x0360-#x0361] | [#x0483-#x0486] | [#x0591-#x05A1] | [#x05A3-#x05B9] | [#x05BB-#x05BD] | #x05BF | [#x05C1-#x05C2] | #x05C4 | [#x064B-#x0652] | #x0670 | [#x06D6-#x06DC] | [#x06DD-#x06DF] | [#x06E0-#x06E4] | [#x06E7-#x06E8] | [#x06EA-#x06ED] | [#x0901-#x0903] | #x093C | [#x093E-#x094C] | #x094D | [#x0951-#x0954] | [#x0962-#x0963] | [#x0981-#x0983] | #x09BC | #x09BE | #x09BF | [#x09C0-#x09C4] | [#x09C7-#x09C8] | [#x09CB-#x09CD] | #x09D7 | [#x09E2-#x09E3] | #x0A02 | #x0A3C | #x0A3E | #x0A3F | [#x0A40-#x0A42] | [#x0A47-#x0A48] | [#x0A4B-#x0A4D] | [#x0A70-#x0A71] | [#x0A81-#x0A83] | #x0ABC | [#x0ABE-#x0AC5] | [#x0AC7-#x0AC9] | [#x0ACB-#x0ACD] | [#x0B01-#x0B03] | #x0B3C | [#x0B3E-#x0B43] | [#x0B47-#x0B48] | [#x0B4B-#x0B4D] | [#x0B56-#x0B57] | [#x0B82-#x0B83] | [#x0BBE-#x0BC2] | [#x0BC6-#x0BC8] | [#x0BCA-#x0BCD] | #x0BD7 | [#x0C01-#x0C03] | [#x0C3E-#x0C44] | [#x0C46-#x0C48] | [#x0C4A-#x0C4D] | [#x0C55-#x0C56] | [#x0C82-#x0C83] | [#x0CBE-#x0CC4] | [#x0CC6-#x0CC8] | [#x0CCA-#x0CCD] | [#x0CD5-#x0CD6] | [#x0D02-#x0D03] | [#x0D3E-#x0D43] | [#x0D46-#x0D48] | [#x0D4A-#x0D4D] | #x0D57 | #x0E31 | [#x0E34-#x0E3A] | [#x0E47-#x0E4E] | #x0EB1 | [#x0EB4-#x0EB9] | [#x0EBB-#x0EBC] | [#x0EC8-#x0ECD] | [#x0F18-#x0F19] | #x0F35 | #x0F37 | #x0F39 | #x0F3E | #x0F3F | [#x0F71-#x0F84] | [#x0F86-#x0F8B] | [#x0F90-#x0F95] | #x0F97 | [#x0F99-#x0FAD] | [#x0FB1-#x0FB7] | #x0FB9 | [#x20D0-#x20DC] | #x20E1 | [#x302A-#x302F] | #x3099 | #x309A"""
combiningCharacter = """
[#x0300-#x0345] | [#x0360-#x0361] | [#x0483-#x0486] | [#x0591-#x05A1] |
[#x05A3-#x05B9] | [#x05BB-#x05BD] | #x05BF | [#x05C1-#x05C2] | #x05C4 |
[#x064B-#x0652] | #x0670 | [#x06D6-#x06DC] | [#x06DD-#x06DF] |
[#x06E0-#x06E4] | [#x06E7-#x06E8] | [#x06EA-#x06ED] | [#x0901-#x0903] |
#x093C | [#x093E-#x094C] | #x094D | [#x0951-#x0954] | [#x0962-#x0963] |
[#x0981-#x0983] | #x09BC | #x09BE | #x09BF | [#x09C0-#x09C4] |
[#x09C7-#x09C8] | [#x09CB-#x09CD] | #x09D7 | [#x09E2-#x09E3] | #x0A02 |
#x0A3C | #x0A3E | #x0A3F | [#x0A40-#x0A42] | [#x0A47-#x0A48] |
[#x0A4B-#x0A4D] | [#x0A70-#x0A71] | [#x0A81-#x0A83] | #x0ABC |
[#x0ABE-#x0AC5] | [#x0AC7-#x0AC9] | [#x0ACB-#x0ACD] | [#x0B01-#x0B03] |
#x0B3C | [#x0B3E-#x0B43] | [#x0B47-#x0B48] | [#x0B4B-#x0B4D] |
[#x0B56-#x0B57] | [#x0B82-#x0B83] | [#x0BBE-#x0BC2] | [#x0BC6-#x0BC8] |
[#x0BCA-#x0BCD] | #x0BD7 | [#x0C01-#x0C03] | [#x0C3E-#x0C44] |
[#x0C46-#x0C48] | [#x0C4A-#x0C4D] | [#x0C55-#x0C56] | [#x0C82-#x0C83] |
[#x0CBE-#x0CC4] | [#x0CC6-#x0CC8] | [#x0CCA-#x0CCD] | [#x0CD5-#x0CD6] |
[#x0D02-#x0D03] | [#x0D3E-#x0D43] | [#x0D46-#x0D48] | [#x0D4A-#x0D4D] |
#x0D57 | #x0E31 | [#x0E34-#x0E3A] | [#x0E47-#x0E4E] | #x0EB1 |
[#x0EB4-#x0EB9] | [#x0EBB-#x0EBC] | [#x0EC8-#x0ECD] | [#x0F18-#x0F19] |
#x0F35 | #x0F37 | #x0F39 | #x0F3E | #x0F3F | [#x0F71-#x0F84] |
[#x0F86-#x0F8B] | [#x0F90-#x0F95] | #x0F97 | [#x0F99-#x0FAD] |
[#x0FB1-#x0FB7] | #x0FB9 | [#x20D0-#x20DC] | #x20E1 | [#x302A-#x302F] |
#x3099 | #x309A"""
digit = """[#x0030-#x0039] | [#x0660-#x0669] | [#x06F0-#x06F9] | [#x0966-#x096F] | [#x09E6-#x09EF] | [#x0A66-#x0A6F] | [#x0AE6-#x0AEF] | [#x0B66-#x0B6F] | [#x0BE7-#x0BEF] | [#x0C66-#x0C6F] | [#x0CE6-#x0CEF] | [#x0D66-#x0D6F] | [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29]"""
digit = """
[#x0030-#x0039] | [#x0660-#x0669] | [#x06F0-#x06F9] | [#x0966-#x096F] |
[#x09E6-#x09EF] | [#x0A66-#x0A6F] | [#x0AE6-#x0AEF] | [#x0B66-#x0B6F] |
[#x0BE7-#x0BEF] | [#x0C66-#x0C6F] | [#x0CE6-#x0CEF] | [#x0D66-#x0D6F] |
[#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29]"""
extender = """#x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6 | #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]"""
extender = """
#x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6 | #x3005 |
#[#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]"""
letter = " | ".join([baseChar, ideographic])
#Without the
name = " | ".join([letter, digit, ".", "-", "_", combiningCharacter,
extender])
# Without the
name = " | ".join([letter, digit, ".", "-", "_", combiningCharacter,
extender])
nameFirst = " | ".join([letter, "_"])
reChar = re.compile(r"#x([\d|A-F]{4,4})")
reCharRange = re.compile(r"\[#x([\d|A-F]{4,4})-#x([\d|A-F]{4,4})\]")
def charStringToList(chars):
charRanges = [item.strip() for item in chars.split(" | ")]
rv = []
@@ -30,16 +110,17 @@ def charStringToList(chars):
if match is not None:
rv.append([hexToInt(item) for item in match.groups()])
if len(rv[-1]) == 1:
rv[-1] = rv[-1]*2
rv[-1] = rv[-1] * 2
foundMatch = True
break
if not foundMatch:
assert len(item) == 1
rv.append([ord(item)] * 2)
rv = normaliseCharList(rv)
return rv
def normaliseCharList(charList):
charList = sorted(charList)
for item in charList:
@@ -49,61 +130,69 @@ def normaliseCharList(charList):
while i < len(charList):
j = 1
rv.append(charList[i])
while i + j < len(charList) and charList[i+j][0] <= rv[-1][1] + 1:
rv[-1][1] = charList[i+j][1]
while i + j < len(charList) and charList[i + j][0] <= rv[-1][1] + 1:
rv[-1][1] = charList[i + j][1]
j += 1
i += j
return rv
#We don't really support characters above the BMP :(
# We don't really support characters above the BMP :(
max_unicode = int("FFFF", 16)
def missingRanges(charList):
rv = []
if charList[0] != 0:
rv.append([0, charList[0][0] - 1])
for i, item in enumerate(charList[:-1]):
rv.append([item[1]+1, charList[i+1][0] - 1])
rv.append([item[1] + 1, charList[i + 1][0] - 1])
if charList[-1][1] != max_unicode:
rv.append([charList[-1][1] + 1, max_unicode])
return rv
def listToRegexpStr(charList):
rv = []
for item in charList:
if item[0] == item[1]:
rv.append(escapeRegexp(unichr(item[0])))
rv.append(escapeRegexp(chr(item[0])))
else:
rv.append(escapeRegexp(unichr(item[0])) + "-" +
escapeRegexp(unichr(item[1])))
return "[%s]"%"".join(rv)
rv.append(escapeRegexp(chr(item[0])) + "-" +
escapeRegexp(chr(item[1])))
return "[%s]" % "".join(rv)
def hexToInt(hex_str):
return int(hex_str, 16)
def escapeRegexp(string):
specialCharacters = (".", "^", "$", "*", "+", "?", "{", "}",
"[", "]", "|", "(", ")", "-")
"[", "]", "|", "(", ")", "-")
for char in specialCharacters:
string = string.replace(char, "\\" + char)
if char in string:
print string
return string
#output from the above
nonXmlNameBMPRegexp = re.compile(u'[\x00-,/:-@\\[-\\^`\\{-\xb6\xb8-\xbf\xd7\xf7\u0132-\u0133\u013f-\u0140\u0149\u017f\u01c4-\u01cc\u01f1-\u01f3\u01f6-\u01f9\u0218-\u024f\u02a9-\u02ba\u02c2-\u02cf\u02d2-\u02ff\u0346-\u035f\u0362-\u0385\u038b\u038d\u03a2\u03cf\u03d7-\u03d9\u03db\u03dd\u03df\u03e1\u03f4-\u0400\u040d\u0450\u045d\u0482\u0487-\u048f\u04c5-\u04c6\u04c9-\u04ca\u04cd-\u04cf\u04ec-\u04ed\u04f6-\u04f7\u04fa-\u0530\u0557-\u0558\u055a-\u0560\u0587-\u0590\u05a2\u05ba\u05be\u05c0\u05c3\u05c5-\u05cf\u05eb-\u05ef\u05f3-\u0620\u063b-\u063f\u0653-\u065f\u066a-\u066f\u06b8-\u06b9\u06bf\u06cf\u06d4\u06e9\u06ee-\u06ef\u06fa-\u0900\u0904\u093a-\u093b\u094e-\u0950\u0955-\u0957\u0964-\u0965\u0970-\u0980\u0984\u098d-\u098e\u0991-\u0992\u09a9\u09b1\u09b3-\u09b5\u09ba-\u09bb\u09bd\u09c5-\u09c6\u09c9-\u09ca\u09ce-\u09d6\u09d8-\u09db\u09de\u09e4-\u09e5\u09f2-\u0a01\u0a03-\u0a04\u0a0b-\u0a0e\u0a11-\u0a12\u0a29\u0a31\u0a34\u0a37\u0a3a-\u0a3b\u0a3d\u0a43-\u0a46\u0a49-\u0a4a\u0a4e-\u0a58\u0a5d\u0a5f-\u0a65\u0a75-\u0a80\u0a84\u0a8c\u0a8e\u0a92\u0aa9\u0ab1\u0ab4\u0aba-\u0abb\u0ac6\u0aca\u0ace-\u0adf\u0ae1-\u0ae5\u0af0-\u0b00\u0b04\u0b0d-\u0b0e\u0b11-\u0b12\u0b29\u0b31\u0b34-\u0b35\u0b3a-\u0b3b\u0b44-\u0b46\u0b49-\u0b4a\u0b4e-\u0b55\u0b58-\u0b5b\u0b5e\u0b62-\u0b65\u0b70-\u0b81\u0b84\u0b8b-\u0b8d\u0b91\u0b96-\u0b98\u0b9b\u0b9d\u0ba0-\u0ba2\u0ba5-\u0ba7\u0bab-\u0bad\u0bb6\u0bba-\u0bbd\u0bc3-\u0bc5\u0bc9\u0bce-\u0bd6\u0bd8-\u0be6\u0bf0-\u0c00\u0c04\u0c0d\u0c11\u0c29\u0c34\u0c3a-\u0c3d\u0c45\u0c49\u0c4e-\u0c54\u0c57-\u0c5f\u0c62-\u0c65\u0c70-\u0c81\u0c84\u0c8d\u0c91\u0ca9\u0cb4\u0cba-\u0cbd\u0cc5\u0cc9\u0cce-\u0cd4\u0cd7-\u0cdd\u0cdf\u0ce2-\u0ce5\u0cf0-\u0d01\u0d04\u0d0d\u0d11\u0d29\u0d3a-\u0d3d\u0d44-\u0d45\u0d49\u0d4e-\u0d56\u0d58-\u0d5f\u0d62-\u0d65\u0d70-\u0e00\u0e2f\u0e3b-\u0e3f\u0e4f\u0e5a-\u0e80\u0e83\u0e85-\u0e86\u0e89\u0e8b-\u0e8c\u0e8e-\u0e93\u0e98\u0ea0\u0ea4\u0ea6\u0ea8-\u0ea9\u0eac\u0eaf\u0eba\u0ebe-\u0ebf\u0ec5\u0ec7\u0ece-\u0ecf\u0eda-\u0f17\u0f1a-\u0f1f\u0f2a-\u0f34\u0f36\u0f38\u0f3a-\u0f3d\u0f48\u0f6a-\u0f70\u0f85\u0f8c-\u0f8f\u0f96\u0f98\u0fae-\u0fb0\u0fb8\u0fba-\u109f\u10c6-\u10cf\u10f7-\u10ff\u1101\u1104\u1108\u110a\u110d\u1113-\u113b\u113d\u113f\u1141-\u114b\u114d\u114f\u1151-\u1153\u1156-\u1158\u115a-\u115e\u1162\u1164\u1166\u1168\u116a-\u116c\u116f-\u1171\u1174\u1176-\u119d\u119f-\u11a7\u11a9-\u11aa\u11ac-\u11ad\u11b0-\u11b6\u11b9\u11bb\u11c3-\u11ea\u11ec-\u11ef\u11f1-\u11f8\u11fa-\u1dff\u1e9c-\u1e9f\u1efa-\u1eff\u1f16-\u1f17\u1f1e-\u1f1f\u1f46-\u1f47\u1f4e-\u1f4f\u1f58\u1f5a\u1f5c\u1f5e\u1f7e-\u1f7f\u1fb5\u1fbd\u1fbf-\u1fc1\u1fc5\u1fcd-\u1fcf\u1fd4-\u1fd5\u1fdc-\u1fdf\u1fed-\u1ff1\u1ff5\u1ffd-\u20cf\u20dd-\u20e0\u20e2-\u2125\u2127-\u2129\u212c-\u212d\u212f-\u217f\u2183-\u3004\u3006\u3008-\u3020\u3030\u3036-\u3040\u3095-\u3098\u309b-\u309c\u309f-\u30a0\u30fb\u30ff-\u3104\u312d-\u4dff\u9fa6-\uabff\ud7a4-\uffff]')
# output from the above
nonXmlNameBMPRegexp = re.compile('[\x00-,/:-@\\[-\\^`\\{-\xb6\xb8-\xbf\xd7\xf7\u0132-\u0133\u013f-\u0140\u0149\u017f\u01c4-\u01cc\u01f1-\u01f3\u01f6-\u01f9\u0218-\u024f\u02a9-\u02ba\u02c2-\u02cf\u02d2-\u02ff\u0346-\u035f\u0362-\u0385\u038b\u038d\u03a2\u03cf\u03d7-\u03d9\u03db\u03dd\u03df\u03e1\u03f4-\u0400\u040d\u0450\u045d\u0482\u0487-\u048f\u04c5-\u04c6\u04c9-\u04ca\u04cd-\u04cf\u04ec-\u04ed\u04f6-\u04f7\u04fa-\u0530\u0557-\u0558\u055a-\u0560\u0587-\u0590\u05a2\u05ba\u05be\u05c0\u05c3\u05c5-\u05cf\u05eb-\u05ef\u05f3-\u0620\u063b-\u063f\u0653-\u065f\u066a-\u066f\u06b8-\u06b9\u06bf\u06cf\u06d4\u06e9\u06ee-\u06ef\u06fa-\u0900\u0904\u093a-\u093b\u094e-\u0950\u0955-\u0957\u0964-\u0965\u0970-\u0980\u0984\u098d-\u098e\u0991-\u0992\u09a9\u09b1\u09b3-\u09b5\u09ba-\u09bb\u09bd\u09c5-\u09c6\u09c9-\u09ca\u09ce-\u09d6\u09d8-\u09db\u09de\u09e4-\u09e5\u09f2-\u0a01\u0a03-\u0a04\u0a0b-\u0a0e\u0a11-\u0a12\u0a29\u0a31\u0a34\u0a37\u0a3a-\u0a3b\u0a3d\u0a43-\u0a46\u0a49-\u0a4a\u0a4e-\u0a58\u0a5d\u0a5f-\u0a65\u0a75-\u0a80\u0a84\u0a8c\u0a8e\u0a92\u0aa9\u0ab1\u0ab4\u0aba-\u0abb\u0ac6\u0aca\u0ace-\u0adf\u0ae1-\u0ae5\u0af0-\u0b00\u0b04\u0b0d-\u0b0e\u0b11-\u0b12\u0b29\u0b31\u0b34-\u0b35\u0b3a-\u0b3b\u0b44-\u0b46\u0b49-\u0b4a\u0b4e-\u0b55\u0b58-\u0b5b\u0b5e\u0b62-\u0b65\u0b70-\u0b81\u0b84\u0b8b-\u0b8d\u0b91\u0b96-\u0b98\u0b9b\u0b9d\u0ba0-\u0ba2\u0ba5-\u0ba7\u0bab-\u0bad\u0bb6\u0bba-\u0bbd\u0bc3-\u0bc5\u0bc9\u0bce-\u0bd6\u0bd8-\u0be6\u0bf0-\u0c00\u0c04\u0c0d\u0c11\u0c29\u0c34\u0c3a-\u0c3d\u0c45\u0c49\u0c4e-\u0c54\u0c57-\u0c5f\u0c62-\u0c65\u0c70-\u0c81\u0c84\u0c8d\u0c91\u0ca9\u0cb4\u0cba-\u0cbd\u0cc5\u0cc9\u0cce-\u0cd4\u0cd7-\u0cdd\u0cdf\u0ce2-\u0ce5\u0cf0-\u0d01\u0d04\u0d0d\u0d11\u0d29\u0d3a-\u0d3d\u0d44-\u0d45\u0d49\u0d4e-\u0d56\u0d58-\u0d5f\u0d62-\u0d65\u0d70-\u0e00\u0e2f\u0e3b-\u0e3f\u0e4f\u0e5a-\u0e80\u0e83\u0e85-\u0e86\u0e89\u0e8b-\u0e8c\u0e8e-\u0e93\u0e98\u0ea0\u0ea4\u0ea6\u0ea8-\u0ea9\u0eac\u0eaf\u0eba\u0ebe-\u0ebf\u0ec5\u0ec7\u0ece-\u0ecf\u0eda-\u0f17\u0f1a-\u0f1f\u0f2a-\u0f34\u0f36\u0f38\u0f3a-\u0f3d\u0f48\u0f6a-\u0f70\u0f85\u0f8c-\u0f8f\u0f96\u0f98\u0fae-\u0fb0\u0fb8\u0fba-\u109f\u10c6-\u10cf\u10f7-\u10ff\u1101\u1104\u1108\u110a\u110d\u1113-\u113b\u113d\u113f\u1141-\u114b\u114d\u114f\u1151-\u1153\u1156-\u1158\u115a-\u115e\u1162\u1164\u1166\u1168\u116a-\u116c\u116f-\u1171\u1174\u1176-\u119d\u119f-\u11a7\u11a9-\u11aa\u11ac-\u11ad\u11b0-\u11b6\u11b9\u11bb\u11c3-\u11ea\u11ec-\u11ef\u11f1-\u11f8\u11fa-\u1dff\u1e9c-\u1e9f\u1efa-\u1eff\u1f16-\u1f17\u1f1e-\u1f1f\u1f46-\u1f47\u1f4e-\u1f4f\u1f58\u1f5a\u1f5c\u1f5e\u1f7e-\u1f7f\u1fb5\u1fbd\u1fbf-\u1fc1\u1fc5\u1fcd-\u1fcf\u1fd4-\u1fd5\u1fdc-\u1fdf\u1fed-\u1ff1\u1ff5\u1ffd-\u20cf\u20dd-\u20e0\u20e2-\u2125\u2127-\u2129\u212c-\u212d\u212f-\u217f\u2183-\u3004\u3006\u3008-\u3020\u3030\u3036-\u3040\u3095-\u3098\u309b-\u309c\u309f-\u30a0\u30fb\u30ff-\u3104\u312d-\u4dff\u9fa6-\uabff\ud7a4-\uffff]')
nonXmlNameFirstBMPRegexp = re.compile('[\x00-@\\[-\\^`\\{-\xbf\xd7\xf7\u0132-\u0133\u013f-\u0140\u0149\u017f\u01c4-\u01cc\u01f1-\u01f3\u01f6-\u01f9\u0218-\u024f\u02a9-\u02ba\u02c2-\u0385\u0387\u038b\u038d\u03a2\u03cf\u03d7-\u03d9\u03db\u03dd\u03df\u03e1\u03f4-\u0400\u040d\u0450\u045d\u0482-\u048f\u04c5-\u04c6\u04c9-\u04ca\u04cd-\u04cf\u04ec-\u04ed\u04f6-\u04f7\u04fa-\u0530\u0557-\u0558\u055a-\u0560\u0587-\u05cf\u05eb-\u05ef\u05f3-\u0620\u063b-\u0640\u064b-\u0670\u06b8-\u06b9\u06bf\u06cf\u06d4\u06d6-\u06e4\u06e7-\u0904\u093a-\u093c\u093e-\u0957\u0962-\u0984\u098d-\u098e\u0991-\u0992\u09a9\u09b1\u09b3-\u09b5\u09ba-\u09db\u09de\u09e2-\u09ef\u09f2-\u0a04\u0a0b-\u0a0e\u0a11-\u0a12\u0a29\u0a31\u0a34\u0a37\u0a3a-\u0a58\u0a5d\u0a5f-\u0a71\u0a75-\u0a84\u0a8c\u0a8e\u0a92\u0aa9\u0ab1\u0ab4\u0aba-\u0abc\u0abe-\u0adf\u0ae1-\u0b04\u0b0d-\u0b0e\u0b11-\u0b12\u0b29\u0b31\u0b34-\u0b35\u0b3a-\u0b3c\u0b3e-\u0b5b\u0b5e\u0b62-\u0b84\u0b8b-\u0b8d\u0b91\u0b96-\u0b98\u0b9b\u0b9d\u0ba0-\u0ba2\u0ba5-\u0ba7\u0bab-\u0bad\u0bb6\u0bba-\u0c04\u0c0d\u0c11\u0c29\u0c34\u0c3a-\u0c5f\u0c62-\u0c84\u0c8d\u0c91\u0ca9\u0cb4\u0cba-\u0cdd\u0cdf\u0ce2-\u0d04\u0d0d\u0d11\u0d29\u0d3a-\u0d5f\u0d62-\u0e00\u0e2f\u0e31\u0e34-\u0e3f\u0e46-\u0e80\u0e83\u0e85-\u0e86\u0e89\u0e8b-\u0e8c\u0e8e-\u0e93\u0e98\u0ea0\u0ea4\u0ea6\u0ea8-\u0ea9\u0eac\u0eaf\u0eb1\u0eb4-\u0ebc\u0ebe-\u0ebf\u0ec5-\u0f3f\u0f48\u0f6a-\u109f\u10c6-\u10cf\u10f7-\u10ff\u1101\u1104\u1108\u110a\u110d\u1113-\u113b\u113d\u113f\u1141-\u114b\u114d\u114f\u1151-\u1153\u1156-\u1158\u115a-\u115e\u1162\u1164\u1166\u1168\u116a-\u116c\u116f-\u1171\u1174\u1176-\u119d\u119f-\u11a7\u11a9-\u11aa\u11ac-\u11ad\u11b0-\u11b6\u11b9\u11bb\u11c3-\u11ea\u11ec-\u11ef\u11f1-\u11f8\u11fa-\u1dff\u1e9c-\u1e9f\u1efa-\u1eff\u1f16-\u1f17\u1f1e-\u1f1f\u1f46-\u1f47\u1f4e-\u1f4f\u1f58\u1f5a\u1f5c\u1f5e\u1f7e-\u1f7f\u1fb5\u1fbd\u1fbf-\u1fc1\u1fc5\u1fcd-\u1fcf\u1fd4-\u1fd5\u1fdc-\u1fdf\u1fed-\u1ff1\u1ff5\u1ffd-\u2125\u2127-\u2129\u212c-\u212d\u212f-\u217f\u2183-\u3006\u3008-\u3020\u302a-\u3040\u3095-\u30a0\u30fb-\u3104\u312d-\u4dff\u9fa6-\uabff\ud7a4-\uffff]')
# Simpler things
nonPubidCharRegexp = re.compile("[^\x20\x0D\x0Aa-zA-Z0-9\-\'()+,./:=?;!*#@$_%]")
nonXmlNameFirstBMPRegexp = re.compile(u'[\x00-@\\[-\\^`\\{-\xbf\xd7\xf7\u0132-\u0133\u013f-\u0140\u0149\u017f\u01c4-\u01cc\u01f1-\u01f3\u01f6-\u01f9\u0218-\u024f\u02a9-\u02ba\u02c2-\u0385\u0387\u038b\u038d\u03a2\u03cf\u03d7-\u03d9\u03db\u03dd\u03df\u03e1\u03f4-\u0400\u040d\u0450\u045d\u0482-\u048f\u04c5-\u04c6\u04c9-\u04ca\u04cd-\u04cf\u04ec-\u04ed\u04f6-\u04f7\u04fa-\u0530\u0557-\u0558\u055a-\u0560\u0587-\u05cf\u05eb-\u05ef\u05f3-\u0620\u063b-\u0640\u064b-\u0670\u06b8-\u06b9\u06bf\u06cf\u06d4\u06d6-\u06e4\u06e7-\u0904\u093a-\u093c\u093e-\u0957\u0962-\u0984\u098d-\u098e\u0991-\u0992\u09a9\u09b1\u09b3-\u09b5\u09ba-\u09db\u09de\u09e2-\u09ef\u09f2-\u0a04\u0a0b-\u0a0e\u0a11-\u0a12\u0a29\u0a31\u0a34\u0a37\u0a3a-\u0a58\u0a5d\u0a5f-\u0a71\u0a75-\u0a84\u0a8c\u0a8e\u0a92\u0aa9\u0ab1\u0ab4\u0aba-\u0abc\u0abe-\u0adf\u0ae1-\u0b04\u0b0d-\u0b0e\u0b11-\u0b12\u0b29\u0b31\u0b34-\u0b35\u0b3a-\u0b3c\u0b3e-\u0b5b\u0b5e\u0b62-\u0b84\u0b8b-\u0b8d\u0b91\u0b96-\u0b98\u0b9b\u0b9d\u0ba0-\u0ba2\u0ba5-\u0ba7\u0bab-\u0bad\u0bb6\u0bba-\u0c04\u0c0d\u0c11\u0c29\u0c34\u0c3a-\u0c5f\u0c62-\u0c84\u0c8d\u0c91\u0ca9\u0cb4\u0cba-\u0cdd\u0cdf\u0ce2-\u0d04\u0d0d\u0d11\u0d29\u0d3a-\u0d5f\u0d62-\u0e00\u0e2f\u0e31\u0e34-\u0e3f\u0e46-\u0e80\u0e83\u0e85-\u0e86\u0e89\u0e8b-\u0e8c\u0e8e-\u0e93\u0e98\u0ea0\u0ea4\u0ea6\u0ea8-\u0ea9\u0eac\u0eaf\u0eb1\u0eb4-\u0ebc\u0ebe-\u0ebf\u0ec5-\u0f3f\u0f48\u0f6a-\u109f\u10c6-\u10cf\u10f7-\u10ff\u1101\u1104\u1108\u110a\u110d\u1113-\u113b\u113d\u113f\u1141-\u114b\u114d\u114f\u1151-\u1153\u1156-\u1158\u115a-\u115e\u1162\u1164\u1166\u1168\u116a-\u116c\u116f-\u1171\u1174\u1176-\u119d\u119f-\u11a7\u11a9-\u11aa\u11ac-\u11ad\u11b0-\u11b6\u11b9\u11bb\u11c3-\u11ea\u11ec-\u11ef\u11f1-\u11f8\u11fa-\u1dff\u1e9c-\u1e9f\u1efa-\u1eff\u1f16-\u1f17\u1f1e-\u1f1f\u1f46-\u1f47\u1f4e-\u1f4f\u1f58\u1f5a\u1f5c\u1f5e\u1f7e-\u1f7f\u1fb5\u1fbd\u1fbf-\u1fc1\u1fc5\u1fcd-\u1fcf\u1fd4-\u1fd5\u1fdc-\u1fdf\u1fed-\u1ff1\u1ff5\u1ffd-\u2125\u2127-\u2129\u212c-\u212d\u212f-\u217f\u2183-\u3006\u3008-\u3020\u302a-\u3040\u3095-\u30a0\u30fb-\u3104\u312d-\u4dff\u9fa6-\uabff\ud7a4-\uffff]')
class InfosetFilter(object):
replacementRegexp = re.compile(r"U[\dA-F]{5,5}")
def __init__(self, replaceChars = None,
dropXmlnsLocalName = False,
dropXmlnsAttrNs = False,
preventDoubleDashComments = False,
preventDashAtCommentEnd = False,
replaceFormFeedCharacters = True):
def __init__(self, replaceChars=None,
dropXmlnsLocalName=False,
dropXmlnsAttrNs=False,
preventDoubleDashComments=False,
preventDashAtCommentEnd=False,
replaceFormFeedCharacters=True,
preventSingleQuotePubid=False):
self.dropXmlnsLocalName = dropXmlnsLocalName
self.dropXmlnsAttrNs = dropXmlnsAttrNs
@@ -113,14 +202,17 @@ class InfosetFilter(object):
self.replaceFormFeedCharacters = replaceFormFeedCharacters
self.preventSingleQuotePubid = preventSingleQuotePubid
self.replaceCache = {}
def coerceAttribute(self, name, namespace=None):
if self.dropXmlnsLocalName and name.startswith("xmlns:"):
#Need a datalosswarning here
warnings.warn("Attributes cannot begin with xmlns", DataLossWarning)
return None
elif (self.dropXmlnsAttrNs and
elif (self.dropXmlnsAttrNs and
namespace == "http://www.w3.org/2000/xmlns/"):
warnings.warn("Attributes cannot be in the xml namespace", DataLossWarning)
return None
else:
return self.toXmlName(name)
@@ -131,20 +223,35 @@ class InfosetFilter(object):
def coerceComment(self, data):
if self.preventDoubleDashComments:
while "--" in data:
warnings.warn("Comments cannot contain adjacent dashes", DataLossWarning)
data = data.replace("--", "- -")
return data
def coerceCharacters(self, data):
if self.replaceFormFeedCharacters:
for i in range(data.count("\x0C")):
warnings.warn("Text cannot contain U+000C", DataLossWarning)
data = data.replace("\x0C", " ")
#Other non-xml characters
# Other non-xml characters
return data
def coercePubid(self, data):
dataOutput = data
for char in nonPubidCharRegexp.findall(data):
warnings.warn("Coercing non-XML pubid", DataLossWarning)
replacement = self.getReplacementCharacter(char)
dataOutput = dataOutput.replace(char, replacement)
if self.preventSingleQuotePubid and dataOutput.find("'") >= 0:
warnings.warn("Pubid cannot contain single quote", DataLossWarning)
dataOutput = dataOutput.replace("'", self.getReplacementCharacter("'"))
return dataOutput
def toXmlName(self, name):
nameFirst = name[0]
nameRest = name[1:]
m = nonXmlNameFirstBMPRegexp.match(nameFirst)
if m:
warnings.warn("Coercing non-XML name", DataLossWarning)
nameFirstOutput = self.getReplacementCharacter(nameFirst)
else:
nameFirstOutput = nameFirst
@@ -152,10 +259,11 @@ class InfosetFilter(object):
nameRestOutput = nameRest
replaceChars = set(nonXmlNameBMPRegexp.findall(nameRest))
for char in replaceChars:
warnings.warn("Coercing non-XML name", DataLossWarning)
replacement = self.getReplacementCharacter(char)
nameRestOutput = nameRestOutput.replace(char, replacement)
return nameFirstOutput + nameRestOutput
def getReplacementCharacter(self, char):
if char in self.replaceCache:
replacement = self.replaceCache[char]
@@ -169,9 +277,9 @@ class InfosetFilter(object):
return name
def escapeChar(self, char):
replacement = "U" + hex(ord(char))[2:].upper().rjust(5, "0")
replacement = "U%05X" % ord(char)
self.replaceCache[char] = replacement
return replacement
def unescapeChar(self, charcode):
return unichr(int(charcode[1:], 16))
return chr(int(charcode[1:], 16))

File diff suppressed because it is too large Load Diff

View File

@@ -1,142 +1,145 @@
from __future__ import absolute_import, division, unicode_literals
import re
from xml.sax.saxutils import escape, unescape
from tokenizer import HTMLTokenizer
from constants import tokenTypes
from .tokenizer import HTMLTokenizer
from .constants import tokenTypes
class HTMLSanitizerMixin(object):
""" sanitization of XHTML+MathML+SVG and of inline style attributes."""
acceptable_elements = ['a', 'abbr', 'acronym', 'address', 'area',
'article', 'aside', 'audio', 'b', 'big', 'blockquote', 'br', 'button',
'canvas', 'caption', 'center', 'cite', 'code', 'col', 'colgroup',
'command', 'datagrid', 'datalist', 'dd', 'del', 'details', 'dfn',
'dialog', 'dir', 'div', 'dl', 'dt', 'em', 'event-source', 'fieldset',
'figcaption', 'figure', 'footer', 'font', 'form', 'header', 'h1',
'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'img', 'input', 'ins',
'keygen', 'kbd', 'label', 'legend', 'li', 'm', 'map', 'menu', 'meter',
'multicol', 'nav', 'nextid', 'ol', 'output', 'optgroup', 'option',
'p', 'pre', 'progress', 'q', 's', 'samp', 'section', 'select',
'small', 'sound', 'source', 'spacer', 'span', 'strike', 'strong',
'sub', 'sup', 'table', 'tbody', 'td', 'textarea', 'time', 'tfoot',
'th', 'thead', 'tr', 'tt', 'u', 'ul', 'var', 'video']
'article', 'aside', 'audio', 'b', 'big', 'blockquote', 'br', 'button',
'canvas', 'caption', 'center', 'cite', 'code', 'col', 'colgroup',
'command', 'datagrid', 'datalist', 'dd', 'del', 'details', 'dfn',
'dialog', 'dir', 'div', 'dl', 'dt', 'em', 'event-source', 'fieldset',
'figcaption', 'figure', 'footer', 'font', 'form', 'header', 'h1',
'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'img', 'input', 'ins',
'keygen', 'kbd', 'label', 'legend', 'li', 'm', 'map', 'menu', 'meter',
'multicol', 'nav', 'nextid', 'ol', 'output', 'optgroup', 'option',
'p', 'pre', 'progress', 'q', 's', 'samp', 'section', 'select',
'small', 'sound', 'source', 'spacer', 'span', 'strike', 'strong',
'sub', 'sup', 'table', 'tbody', 'td', 'textarea', 'time', 'tfoot',
'th', 'thead', 'tr', 'tt', 'u', 'ul', 'var', 'video']
mathml_elements = ['maction', 'math', 'merror', 'mfrac', 'mi',
'mmultiscripts', 'mn', 'mo', 'mover', 'mpadded', 'mphantom',
'mprescripts', 'mroot', 'mrow', 'mspace', 'msqrt', 'mstyle', 'msub',
'msubsup', 'msup', 'mtable', 'mtd', 'mtext', 'mtr', 'munder',
'munderover', 'none']
'mmultiscripts', 'mn', 'mo', 'mover', 'mpadded', 'mphantom',
'mprescripts', 'mroot', 'mrow', 'mspace', 'msqrt', 'mstyle', 'msub',
'msubsup', 'msup', 'mtable', 'mtd', 'mtext', 'mtr', 'munder',
'munderover', 'none']
svg_elements = ['a', 'animate', 'animateColor', 'animateMotion',
'animateTransform', 'clipPath', 'circle', 'defs', 'desc', 'ellipse',
'font-face', 'font-face-name', 'font-face-src', 'g', 'glyph', 'hkern',
'linearGradient', 'line', 'marker', 'metadata', 'missing-glyph',
'mpath', 'path', 'polygon', 'polyline', 'radialGradient', 'rect',
'set', 'stop', 'svg', 'switch', 'text', 'title', 'tspan', 'use']
'animateTransform', 'clipPath', 'circle', 'defs', 'desc', 'ellipse',
'font-face', 'font-face-name', 'font-face-src', 'g', 'glyph', 'hkern',
'linearGradient', 'line', 'marker', 'metadata', 'missing-glyph',
'mpath', 'path', 'polygon', 'polyline', 'radialGradient', 'rect',
'set', 'stop', 'svg', 'switch', 'text', 'title', 'tspan', 'use']
acceptable_attributes = ['abbr', 'accept', 'accept-charset', 'accesskey',
'action', 'align', 'alt', 'autocomplete', 'autofocus', 'axis',
'background', 'balance', 'bgcolor', 'bgproperties', 'border',
'bordercolor', 'bordercolordark', 'bordercolorlight', 'bottompadding',
'cellpadding', 'cellspacing', 'ch', 'challenge', 'char', 'charoff',
'choff', 'charset', 'checked', 'cite', 'class', 'clear', 'color',
'cols', 'colspan', 'compact', 'contenteditable', 'controls', 'coords',
'data', 'datafld', 'datapagesize', 'datasrc', 'datetime', 'default',
'delay', 'dir', 'disabled', 'draggable', 'dynsrc', 'enctype', 'end',
'face', 'for', 'form', 'frame', 'galleryimg', 'gutter', 'headers',
'height', 'hidefocus', 'hidden', 'high', 'href', 'hreflang', 'hspace',
'icon', 'id', 'inputmode', 'ismap', 'keytype', 'label', 'leftspacing',
'lang', 'list', 'longdesc', 'loop', 'loopcount', 'loopend',
'loopstart', 'low', 'lowsrc', 'max', 'maxlength', 'media', 'method',
'min', 'multiple', 'name', 'nohref', 'noshade', 'nowrap', 'open',
'optimum', 'pattern', 'ping', 'point-size', 'prompt', 'pqg',
'radiogroup', 'readonly', 'rel', 'repeat-max', 'repeat-min',
'replace', 'required', 'rev', 'rightspacing', 'rows', 'rowspan',
'rules', 'scope', 'selected', 'shape', 'size', 'span', 'src', 'start',
'step', 'style', 'summary', 'suppress', 'tabindex', 'target',
'template', 'title', 'toppadding', 'type', 'unselectable', 'usemap',
'urn', 'valign', 'value', 'variable', 'volume', 'vspace', 'vrml',
'width', 'wrap', 'xml:lang']
'action', 'align', 'alt', 'autocomplete', 'autofocus', 'axis',
'background', 'balance', 'bgcolor', 'bgproperties', 'border',
'bordercolor', 'bordercolordark', 'bordercolorlight', 'bottompadding',
'cellpadding', 'cellspacing', 'ch', 'challenge', 'char', 'charoff',
'choff', 'charset', 'checked', 'cite', 'class', 'clear', 'color',
'cols', 'colspan', 'compact', 'contenteditable', 'controls', 'coords',
'data', 'datafld', 'datapagesize', 'datasrc', 'datetime', 'default',
'delay', 'dir', 'disabled', 'draggable', 'dynsrc', 'enctype', 'end',
'face', 'for', 'form', 'frame', 'galleryimg', 'gutter', 'headers',
'height', 'hidefocus', 'hidden', 'high', 'href', 'hreflang', 'hspace',
'icon', 'id', 'inputmode', 'ismap', 'keytype', 'label', 'leftspacing',
'lang', 'list', 'longdesc', 'loop', 'loopcount', 'loopend',
'loopstart', 'low', 'lowsrc', 'max', 'maxlength', 'media', 'method',
'min', 'multiple', 'name', 'nohref', 'noshade', 'nowrap', 'open',
'optimum', 'pattern', 'ping', 'point-size', 'poster', 'pqg', 'preload',
'prompt', 'radiogroup', 'readonly', 'rel', 'repeat-max', 'repeat-min',
'replace', 'required', 'rev', 'rightspacing', 'rows', 'rowspan',
'rules', 'scope', 'selected', 'shape', 'size', 'span', 'src', 'start',
'step', 'style', 'summary', 'suppress', 'tabindex', 'target',
'template', 'title', 'toppadding', 'type', 'unselectable', 'usemap',
'urn', 'valign', 'value', 'variable', 'volume', 'vspace', 'vrml',
'width', 'wrap', 'xml:lang']
mathml_attributes = ['actiontype', 'align', 'columnalign', 'columnalign',
'columnalign', 'columnlines', 'columnspacing', 'columnspan', 'depth',
'display', 'displaystyle', 'equalcolumns', 'equalrows', 'fence',
'fontstyle', 'fontweight', 'frame', 'height', 'linethickness', 'lspace',
'mathbackground', 'mathcolor', 'mathvariant', 'mathvariant', 'maxsize',
'minsize', 'other', 'rowalign', 'rowalign', 'rowalign', 'rowlines',
'rowspacing', 'rowspan', 'rspace', 'scriptlevel', 'selection',
'separator', 'stretchy', 'width', 'width', 'xlink:href', 'xlink:show',
'xlink:type', 'xmlns', 'xmlns:xlink']
svg_attributes = ['accent-height', 'accumulate', 'additive', 'alphabetic',
'arabic-form', 'ascent', 'attributeName', 'attributeType',
'baseProfile', 'bbox', 'begin', 'by', 'calcMode', 'cap-height',
'class', 'clip-path', 'color', 'color-rendering', 'content', 'cx',
'cy', 'd', 'dx', 'dy', 'descent', 'display', 'dur', 'end', 'fill',
'fill-opacity', 'fill-rule', 'font-family', 'font-size',
'font-stretch', 'font-style', 'font-variant', 'font-weight', 'from',
'fx', 'fy', 'g1', 'g2', 'glyph-name', 'gradientUnits', 'hanging',
'height', 'horiz-adv-x', 'horiz-origin-x', 'id', 'ideographic', 'k',
'keyPoints', 'keySplines', 'keyTimes', 'lang', 'marker-end',
'marker-mid', 'marker-start', 'markerHeight', 'markerUnits',
'markerWidth', 'mathematical', 'max', 'min', 'name', 'offset',
'opacity', 'orient', 'origin', 'overline-position',
'overline-thickness', 'panose-1', 'path', 'pathLength', 'points',
'preserveAspectRatio', 'r', 'refX', 'refY', 'repeatCount',
'repeatDur', 'requiredExtensions', 'requiredFeatures', 'restart',
'rotate', 'rx', 'ry', 'slope', 'stemh', 'stemv', 'stop-color',
'stop-opacity', 'strikethrough-position', 'strikethrough-thickness',
'stroke', 'stroke-dasharray', 'stroke-dashoffset', 'stroke-linecap',
'stroke-linejoin', 'stroke-miterlimit', 'stroke-opacity',
'stroke-width', 'systemLanguage', 'target', 'text-anchor', 'to',
'transform', 'type', 'u1', 'u2', 'underline-position',
'underline-thickness', 'unicode', 'unicode-range', 'units-per-em',
'values', 'version', 'viewBox', 'visibility', 'width', 'widths', 'x',
'x-height', 'x1', 'x2', 'xlink:actuate', 'xlink:arcrole',
'xlink:href', 'xlink:role', 'xlink:show', 'xlink:title', 'xlink:type',
'xml:base', 'xml:lang', 'xml:space', 'xmlns', 'xmlns:xlink', 'y',
'y1', 'y2', 'zoomAndPan']
'columnalign', 'columnlines', 'columnspacing', 'columnspan', 'depth',
'display', 'displaystyle', 'equalcolumns', 'equalrows', 'fence',
'fontstyle', 'fontweight', 'frame', 'height', 'linethickness', 'lspace',
'mathbackground', 'mathcolor', 'mathvariant', 'mathvariant', 'maxsize',
'minsize', 'other', 'rowalign', 'rowalign', 'rowalign', 'rowlines',
'rowspacing', 'rowspan', 'rspace', 'scriptlevel', 'selection',
'separator', 'stretchy', 'width', 'width', 'xlink:href', 'xlink:show',
'xlink:type', 'xmlns', 'xmlns:xlink']
attr_val_is_uri = ['href', 'src', 'cite', 'action', 'longdesc',
'xlink:href', 'xml:base']
svg_attributes = ['accent-height', 'accumulate', 'additive', 'alphabetic',
'arabic-form', 'ascent', 'attributeName', 'attributeType',
'baseProfile', 'bbox', 'begin', 'by', 'calcMode', 'cap-height',
'class', 'clip-path', 'color', 'color-rendering', 'content', 'cx',
'cy', 'd', 'dx', 'dy', 'descent', 'display', 'dur', 'end', 'fill',
'fill-opacity', 'fill-rule', 'font-family', 'font-size',
'font-stretch', 'font-style', 'font-variant', 'font-weight', 'from',
'fx', 'fy', 'g1', 'g2', 'glyph-name', 'gradientUnits', 'hanging',
'height', 'horiz-adv-x', 'horiz-origin-x', 'id', 'ideographic', 'k',
'keyPoints', 'keySplines', 'keyTimes', 'lang', 'marker-end',
'marker-mid', 'marker-start', 'markerHeight', 'markerUnits',
'markerWidth', 'mathematical', 'max', 'min', 'name', 'offset',
'opacity', 'orient', 'origin', 'overline-position',
'overline-thickness', 'panose-1', 'path', 'pathLength', 'points',
'preserveAspectRatio', 'r', 'refX', 'refY', 'repeatCount',
'repeatDur', 'requiredExtensions', 'requiredFeatures', 'restart',
'rotate', 'rx', 'ry', 'slope', 'stemh', 'stemv', 'stop-color',
'stop-opacity', 'strikethrough-position', 'strikethrough-thickness',
'stroke', 'stroke-dasharray', 'stroke-dashoffset', 'stroke-linecap',
'stroke-linejoin', 'stroke-miterlimit', 'stroke-opacity',
'stroke-width', 'systemLanguage', 'target', 'text-anchor', 'to',
'transform', 'type', 'u1', 'u2', 'underline-position',
'underline-thickness', 'unicode', 'unicode-range', 'units-per-em',
'values', 'version', 'viewBox', 'visibility', 'width', 'widths', 'x',
'x-height', 'x1', 'x2', 'xlink:actuate', 'xlink:arcrole',
'xlink:href', 'xlink:role', 'xlink:show', 'xlink:title', 'xlink:type',
'xml:base', 'xml:lang', 'xml:space', 'xmlns', 'xmlns:xlink', 'y',
'y1', 'y2', 'zoomAndPan']
attr_val_is_uri = ['href', 'src', 'cite', 'action', 'longdesc', 'poster',
'xlink:href', 'xml:base']
svg_attr_val_allows_ref = ['clip-path', 'color-profile', 'cursor', 'fill',
'filter', 'marker', 'marker-start', 'marker-mid', 'marker-end',
'mask', 'stroke']
'filter', 'marker', 'marker-start', 'marker-mid', 'marker-end',
'mask', 'stroke']
svg_allow_local_href = ['altGlyph', 'animate', 'animateColor',
'animateMotion', 'animateTransform', 'cursor', 'feImage', 'filter',
'linearGradient', 'pattern', 'radialGradient', 'textpath', 'tref',
'set', 'use']
'animateMotion', 'animateTransform', 'cursor', 'feImage', 'filter',
'linearGradient', 'pattern', 'radialGradient', 'textpath', 'tref',
'set', 'use']
acceptable_css_properties = ['azimuth', 'background-color',
'border-bottom-color', 'border-collapse', 'border-color',
'border-left-color', 'border-right-color', 'border-top-color', 'clear',
'color', 'cursor', 'direction', 'display', 'elevation', 'float', 'font',
'font-family', 'font-size', 'font-style', 'font-variant', 'font-weight',
'height', 'letter-spacing', 'line-height', 'overflow', 'pause',
'pause-after', 'pause-before', 'pitch', 'pitch-range', 'richness',
'speak', 'speak-header', 'speak-numeral', 'speak-punctuation',
'speech-rate', 'stress', 'text-align', 'text-decoration', 'text-indent',
'unicode-bidi', 'vertical-align', 'voice-family', 'volume',
'white-space', 'width']
'border-bottom-color', 'border-collapse', 'border-color',
'border-left-color', 'border-right-color', 'border-top-color', 'clear',
'color', 'cursor', 'direction', 'display', 'elevation', 'float', 'font',
'font-family', 'font-size', 'font-style', 'font-variant', 'font-weight',
'height', 'letter-spacing', 'line-height', 'overflow', 'pause',
'pause-after', 'pause-before', 'pitch', 'pitch-range', 'richness',
'speak', 'speak-header', 'speak-numeral', 'speak-punctuation',
'speech-rate', 'stress', 'text-align', 'text-decoration', 'text-indent',
'unicode-bidi', 'vertical-align', 'voice-family', 'volume',
'white-space', 'width']
acceptable_css_keywords = ['auto', 'aqua', 'black', 'block', 'blue',
'bold', 'both', 'bottom', 'brown', 'center', 'collapse', 'dashed',
'dotted', 'fuchsia', 'gray', 'green', '!important', 'italic', 'left',
'lime', 'maroon', 'medium', 'none', 'navy', 'normal', 'nowrap', 'olive',
'pointer', 'purple', 'red', 'right', 'solid', 'silver', 'teal', 'top',
'transparent', 'underline', 'white', 'yellow']
acceptable_svg_properties = [ 'fill', 'fill-opacity', 'fill-rule',
'stroke', 'stroke-width', 'stroke-linecap', 'stroke-linejoin',
'stroke-opacity']
acceptable_protocols = [ 'ed2k', 'ftp', 'http', 'https', 'irc',
'mailto', 'news', 'gopher', 'nntp', 'telnet', 'webcal',
'xmpp', 'callto', 'feed', 'urn', 'aim', 'rsync', 'tag',
'ssh', 'sftp', 'rtsp', 'afs' ]
'bold', 'both', 'bottom', 'brown', 'center', 'collapse', 'dashed',
'dotted', 'fuchsia', 'gray', 'green', '!important', 'italic', 'left',
'lime', 'maroon', 'medium', 'none', 'navy', 'normal', 'nowrap', 'olive',
'pointer', 'purple', 'red', 'right', 'solid', 'silver', 'teal', 'top',
'transparent', 'underline', 'white', 'yellow']
acceptable_svg_properties = ['fill', 'fill-opacity', 'fill-rule',
'stroke', 'stroke-width', 'stroke-linecap', 'stroke-linejoin',
'stroke-opacity']
acceptable_protocols = ['ed2k', 'ftp', 'http', 'https', 'irc',
'mailto', 'news', 'gopher', 'nntp', 'telnet', 'webcal',
'xmpp', 'callto', 'feed', 'urn', 'aim', 'rsync', 'tag',
'ssh', 'sftp', 'rtsp', 'afs']
# subclasses may define their own versions of these constants
allowed_elements = acceptable_elements + mathml_elements + svg_elements
allowed_attributes = acceptable_attributes + mathml_attributes + svg_attributes
@@ -160,94 +163,104 @@ class HTMLSanitizerMixin(object):
# accommodate filters which use token_type differently
token_type = token["type"]
if token_type in tokenTypes.keys():
token_type = tokenTypes[token_type]
if token_type in list(tokenTypes.keys()):
token_type = tokenTypes[token_type]
if token_type in (tokenTypes["StartTag"], tokenTypes["EndTag"],
tokenTypes["EmptyTag"]):
if token_type in (tokenTypes["StartTag"], tokenTypes["EndTag"],
tokenTypes["EmptyTag"]):
if token["name"] in self.allowed_elements:
if token.has_key("data"):
attrs = dict([(name,val) for name,val in
token["data"][::-1]
if name in self.allowed_attributes])
for attr in self.attr_val_is_uri:
if not attrs.has_key(attr):
continue
val_unescaped = re.sub("[`\000-\040\177-\240\s]+", '',
unescape(attrs[attr])).lower()
#remove replacement characters from unescaped characters
val_unescaped = val_unescaped.replace(u"\ufffd", "")
if (re.match("^[a-z0-9][-+.a-z0-9]*:",val_unescaped) and
(val_unescaped.split(':')[0] not in
self.allowed_protocols)):
del attrs[attr]
for attr in self.svg_attr_val_allows_ref:
if attr in attrs:
attrs[attr] = re.sub(r'url\s*\(\s*[^#\s][^)]+?\)',
' ',
unescape(attrs[attr]))
if (token["name"] in self.svg_allow_local_href and
'xlink:href' in attrs and re.search('^\s*[^#\s].*',
attrs['xlink:href'])):
del attrs['xlink:href']
if attrs.has_key('style'):
attrs['style'] = self.sanitize_css(attrs['style'])
token["data"] = [[name,val] for name,val in attrs.items()]
return token
return self.allowed_token(token, token_type)
else:
if token_type == tokenTypes["EndTag"]:
token["data"] = "</%s>" % token["name"]
elif token["data"]:
attrs = ''.join([' %s="%s"' % (k,escape(v)) for k,v in token["data"]])
token["data"] = "<%s%s>" % (token["name"],attrs)
else:
token["data"] = "<%s>" % token["name"]
if token.get("selfClosing"):
token["data"]=token["data"][:-1] + "/>"
if token["type"] in tokenTypes.keys():
token["type"] = "Characters"
else:
token["type"] = tokenTypes["Characters"]
del token["name"]
return token
return self.disallowed_token(token, token_type)
elif token_type == tokenTypes["Comment"]:
pass
else:
return token
def allowed_token(self, token, token_type):
if "data" in token:
attrs = dict([(name, val) for name, val in
token["data"][::-1]
if name in self.allowed_attributes])
for attr in self.attr_val_is_uri:
if attr not in attrs:
continue
val_unescaped = re.sub("[`\000-\040\177-\240\s]+", '',
unescape(attrs[attr])).lower()
# remove replacement characters from unescaped characters
val_unescaped = val_unescaped.replace("\ufffd", "")
if (re.match("^[a-z0-9][-+.a-z0-9]*:", val_unescaped) and
(val_unescaped.split(':')[0] not in
self.allowed_protocols)):
del attrs[attr]
for attr in self.svg_attr_val_allows_ref:
if attr in attrs:
attrs[attr] = re.sub(r'url\s*\(\s*[^#\s][^)]+?\)',
' ',
unescape(attrs[attr]))
if (token["name"] in self.svg_allow_local_href and
'xlink:href' in attrs and re.search('^\s*[^#\s].*',
attrs['xlink:href'])):
del attrs['xlink:href']
if 'style' in attrs:
attrs['style'] = self.sanitize_css(attrs['style'])
token["data"] = [[name, val] for name, val in list(attrs.items())]
return token
def disallowed_token(self, token, token_type):
if token_type == tokenTypes["EndTag"]:
token["data"] = "</%s>" % token["name"]
elif token["data"]:
attrs = ''.join([' %s="%s"' % (k, escape(v)) for k, v in token["data"]])
token["data"] = "<%s%s>" % (token["name"], attrs)
else:
token["data"] = "<%s>" % token["name"]
if token.get("selfClosing"):
token["data"] = token["data"][:-1] + "/>"
if token["type"] in list(tokenTypes.keys()):
token["type"] = "Characters"
else:
token["type"] = tokenTypes["Characters"]
del token["name"]
return token
def sanitize_css(self, style):
# disallow urls
style=re.compile('url\s*\(\s*[^\s)]+?\s*\)\s*').sub(' ',style)
style = re.compile('url\s*\(\s*[^\s)]+?\s*\)\s*').sub(' ', style)
# gauntlet
if not re.match("""^([:,;#%.\sa-zA-Z0-9!]|\w-\w|'[\s\w]+'|"[\s\w]+"|\([\d,\s]+\))*$""", style): return ''
if not re.match("^\s*([-\w]+\s*:[^:;]*(;\s*|$))*$", style): return ''
if not re.match("""^([:,;#%.\sa-zA-Z0-9!]|\w-\w|'[\s\w]+'|"[\s\w]+"|\([\d,\s]+\))*$""", style):
return ''
if not re.match("^\s*([-\w]+\s*:[^:;]*(;\s*|$))*$", style):
return ''
clean = []
for prop,value in re.findall("([-\w]+)\s*:\s*([^:;]*)",style):
if not value: continue
if prop.lower() in self.allowed_css_properties:
clean.append(prop + ': ' + value + ';')
elif prop.split('-')[0].lower() in ['background','border','margin',
'padding']:
for keyword in value.split():
if not keyword in self.acceptable_css_keywords and \
not re.match("^(#[0-9a-f]+|rgb\(\d+%?,\d*%?,?\d*%?\)?|\d{0,2}\.?\d{0,2}(cm|em|ex|in|mm|pc|pt|px|%|,|\))?)$",keyword):
break
else:
clean.append(prop + ': ' + value + ';')
elif prop.lower() in self.allowed_svg_properties:
clean.append(prop + ': ' + value + ';')
for prop, value in re.findall("([-\w]+)\s*:\s*([^:;]*)", style):
if not value:
continue
if prop.lower() in self.allowed_css_properties:
clean.append(prop + ': ' + value + ';')
elif prop.split('-')[0].lower() in ['background', 'border', 'margin',
'padding']:
for keyword in value.split():
if not keyword in self.acceptable_css_keywords and \
not re.match("^(#[0-9a-f]+|rgb\(\d+%?,\d*%?,?\d*%?\)?|\d{0,2}\.?\d{0,2}(cm|em|ex|in|mm|pc|pt|px|%|,|\))?)$", keyword):
break
else:
clean.append(prop + ': ' + value + ';')
elif prop.lower() in self.allowed_svg_properties:
clean.append(prop + ': ' + value + ';')
return ' '.join(clean)
class HTMLSanitizer(HTMLTokenizer, HTMLSanitizerMixin):
def __init__(self, stream, encoding=None, parseMeta=True, useChardet=True,
lowercaseElementName=False, lowercaseAttrName=False, parser=None):
#Change case matching defaults as we only output lowercase html anyway
#This solution doesn't seem ideal...
# Change case matching defaults as we only output lowercase html anyway
# This solution doesn't seem ideal...
HTMLTokenizer.__init__(self, stream, encoding, parseMeta, useChardet,
lowercaseElementName, lowercaseAttrName, parser=parser)

View File

@@ -1,17 +1,16 @@
from __future__ import absolute_import, division, unicode_literals
from html5lib import treewalkers
from .. import treewalkers
from htmlserializer import HTMLSerializer
from xhtmlserializer import XHTMLSerializer
from .htmlserializer import HTMLSerializer
def serialize(input, tree="simpletree", format="html", encoding=None,
def serialize(input, tree="etree", format="html", encoding=None,
**serializer_opts):
# XXX: Should we cache this?
walker = treewalkers.getTreeWalker(tree)
walker = treewalkers.getTreeWalker(tree)
if format == "html":
s = HTMLSerializer(**serializer_opts)
elif format == "xhtml":
s = XHTMLSerializer(**serializer_opts)
else:
raise ValueError, "type must be either html or xhtml"
raise ValueError("type must be html")
return s.render(walker(input), encoding)

View File

@@ -1,18 +1,20 @@
try:
frozenset
except NameError:
# Import from the sets module for python 2.3
from sets import ImmutableSet as frozenset
from __future__ import absolute_import, division, unicode_literals
from six import text_type
import gettext
_ = gettext.gettext
from html5lib.constants import voidElements, booleanAttributes, spaceCharacters
from html5lib.constants import rcdataElements, entities, xmlEntities
from html5lib import utils
try:
from functools import reduce
except ImportError:
pass
from ..constants import voidElements, booleanAttributes, spaceCharacters
from ..constants import rcdataElements, entities, xmlEntities
from .. import utils
from xml.sax.saxutils import escape
spaceCharacters = u"".join(spaceCharacters)
spaceCharacters = "".join(spaceCharacters)
try:
from codecs import register_error, xmlcharrefreplace_errors
@@ -21,24 +23,18 @@ except ImportError:
else:
unicode_encode_errors = "htmlentityreplace"
from html5lib.constants import entities
encode_entity_map = {}
is_ucs4 = len(u"\U0010FFFF") == 1
for k, v in entities.items():
#skip multi-character entities
is_ucs4 = len("\U0010FFFF") == 1
for k, v in list(entities.items()):
# skip multi-character entities
if ((is_ucs4 and len(v) > 1) or
(not is_ucs4 and len(v) > 2)):
(not is_ucs4 and len(v) > 2)):
continue
if v != "&":
if len(v) == 2:
v = utils.surrogatePairToCodepoint(v)
else:
try:
v = ord(v)
except:
print v
raise
v = ord(v)
if not v in encode_entity_map or k.islower():
# prefer &lt; over &LT; and similarly for &amp;, &gt;, etc.
encode_entity_map[v] = k
@@ -53,8 +49,8 @@ else:
skip = False
continue
index = i + exc.start
if utils.isSurrogatePair(exc.object[index:min([exc.end, index+2])]):
codepoint = utils.surrogatePairToCodepoint(exc.object[index:index+2])
if utils.isSurrogatePair(exc.object[index:min([exc.end, index + 2])]):
codepoint = utils.surrogatePairToCodepoint(exc.object[index:index + 2])
skip = True
else:
codepoint = ord(c)
@@ -67,8 +63,8 @@ else:
if not e.endswith(";"):
res.append(";")
else:
res.append("&#x%s;"%(hex(cp)[2:]))
return (u"".join(res), exc.end)
res.append("&#x%s;" % (hex(cp)[2:]))
return ("".join(res), exc.end)
else:
return xmlcharrefreplace_errors(exc)
@@ -81,7 +77,7 @@ class HTMLSerializer(object):
# attribute quoting options
quote_attr_values = False
quote_char = u'"'
quote_char = '"'
use_best_quote_char = True
# tag syntax options
@@ -96,15 +92,17 @@ class HTMLSerializer(object):
resolve_entities = True
# miscellaneous options
alphabetical_attributes = False
inject_meta_charset = True
strip_whitespace = False
sanitize = False
options = ("quote_attr_values", "quote_char", "use_best_quote_char",
"minimize_boolean_attributes", "use_trailing_solidus",
"space_before_trailing_solidus", "omit_optional_tags",
"strip_whitespace", "inject_meta_charset", "escape_lt_in_attrs",
"escape_rcdata", "resolve_entities", "sanitize")
"omit_optional_tags", "minimize_boolean_attributes",
"use_trailing_solidus", "space_before_trailing_solidus",
"escape_lt_in_attrs", "escape_rcdata", "resolve_entities",
"alphabetical_attributes", "inject_meta_charset",
"strip_whitespace", "sanitize")
def __init__(self, **kwargs):
"""Initialize HTMLSerializer.
@@ -147,10 +145,12 @@ class HTMLSerializer(object):
See `html5lib user documentation`_
omit_optional_tags=True|False
Omit start/end tags that are optional.
alphabetical_attributes=False|True
Reorder attributes to be in alphabetical order.
.. _html5lib user documentation: http://code.google.com/p/html5lib/wiki/UserDocumentation
"""
if kwargs.has_key('quote_char'):
if 'quote_char' in kwargs:
self.use_best_quote_char = False
for attr in self.options:
setattr(self, attr, kwargs.get(attr, getattr(self, attr)))
@@ -158,14 +158,14 @@ class HTMLSerializer(object):
self.strict = False
def encode(self, string):
assert(isinstance(string, unicode))
assert(isinstance(string, text_type))
if self.encoding:
return string.encode(self.encoding, unicode_encode_errors)
else:
return string
def encodeStrict(self, string):
assert(isinstance(string, unicode))
assert(isinstance(string, text_type))
if self.encoding:
return string.encode(self.encoding, "strict")
else:
@@ -175,39 +175,46 @@ class HTMLSerializer(object):
self.encoding = encoding
in_cdata = False
self.errors = []
if encoding and self.inject_meta_charset:
from html5lib.filters.inject_meta_charset import Filter
from ..filters.inject_meta_charset import Filter
treewalker = Filter(treewalker, encoding)
# XXX: WhitespaceFilter should be used before OptionalTagFilter
# WhitespaceFilter should be used before OptionalTagFilter
# for maximum efficiently of this latter filter
if self.strip_whitespace:
from html5lib.filters.whitespace import Filter
from ..filters.whitespace import Filter
treewalker = Filter(treewalker)
if self.sanitize:
from html5lib.filters.sanitizer import Filter
from ..filters.sanitizer import Filter
treewalker = Filter(treewalker)
if self.omit_optional_tags:
from html5lib.filters.optionaltags import Filter
from ..filters.optionaltags import Filter
treewalker = Filter(treewalker)
# Alphabetical attributes must be last, as other filters
# could add attributes and alter the order
if self.alphabetical_attributes:
from ..filters.alphabeticalattributes import Filter
treewalker = Filter(treewalker)
for token in treewalker:
type = token["type"]
if type == "Doctype":
doctype = u"<!DOCTYPE %s" % token["name"]
doctype = "<!DOCTYPE %s" % token["name"]
if token["publicId"]:
doctype += u' PUBLIC "%s"' % token["publicId"]
doctype += ' PUBLIC "%s"' % token["publicId"]
elif token["systemId"]:
doctype += u" SYSTEM"
if token["systemId"]:
if token["systemId"].find(u'"') >= 0:
if token["systemId"].find(u"'") >= 0:
doctype += " SYSTEM"
if token["systemId"]:
if token["systemId"].find('"') >= 0:
if token["systemId"].find("'") >= 0:
self.serializeError(_("System identifer contains both single and double quote characters"))
quote_char = u"'"
quote_char = "'"
else:
quote_char = u'"'
doctype += u" %s%s%s" % (quote_char, token["systemId"], quote_char)
doctype += u">"
quote_char = '"'
doctype += " %s%s%s" % (quote_char, token["systemId"], quote_char)
doctype += ">"
yield self.encodeStrict(doctype)
elif type in ("Characters", "SpaceCharacters"):
@@ -220,41 +227,41 @@ class HTMLSerializer(object):
elif type in ("StartTag", "EmptyTag"):
name = token["name"]
yield self.encodeStrict(u"<%s" % name)
yield self.encodeStrict("<%s" % name)
if name in rcdataElements and not self.escape_rcdata:
in_cdata = True
elif in_cdata:
self.serializeError(_("Unexpected child element of a CDATA element"))
attributes = []
for (attr_namespace,attr_name),attr_value in sorted(token["data"].items()):
#TODO: Add namespace support here
for (attr_namespace, attr_name), attr_value in token["data"].items():
# TODO: Add namespace support here
k = attr_name
v = attr_value
yield self.encodeStrict(u' ')
yield self.encodeStrict(' ')
yield self.encodeStrict(k)
if not self.minimize_boolean_attributes or \
(k not in booleanAttributes.get(name, tuple()) \
and k not in booleanAttributes.get("", tuple())):
yield self.encodeStrict(u"=")
(k not in booleanAttributes.get(name, tuple())
and k not in booleanAttributes.get("", tuple())):
yield self.encodeStrict("=")
if self.quote_attr_values or not v:
quote_attr = True
else:
quote_attr = reduce(lambda x,y: x or (y in v),
spaceCharacters + u">\"'=", False)
v = v.replace(u"&", u"&amp;")
if self.escape_lt_in_attrs: v = v.replace(u"<", u"&lt;")
quote_attr = reduce(lambda x, y: x or (y in v),
spaceCharacters + ">\"'=", False)
v = v.replace("&", "&amp;")
if self.escape_lt_in_attrs:
v = v.replace("<", "&lt;")
if quote_attr:
quote_char = self.quote_char
if self.use_best_quote_char:
if u"'" in v and u'"' not in v:
quote_char = u'"'
elif u'"' in v and u"'" not in v:
quote_char = u"'"
if quote_char == u"'":
v = v.replace(u"'", u"&#39;")
if "'" in v and '"' not in v:
quote_char = '"'
elif '"' in v and "'" not in v:
quote_char = "'"
if quote_char == "'":
v = v.replace("'", "&#39;")
else:
v = v.replace(u'"', u"&quot;")
v = v.replace('"', "&quot;")
yield self.encodeStrict(quote_char)
yield self.encode(v)
yield self.encodeStrict(quote_char)
@@ -262,10 +269,10 @@ class HTMLSerializer(object):
yield self.encode(v)
if name in voidElements and self.use_trailing_solidus:
if self.space_before_trailing_solidus:
yield self.encodeStrict(u" /")
yield self.encodeStrict(" /")
else:
yield self.encodeStrict(u"/")
yield self.encode(u">")
yield self.encodeStrict("/")
yield self.encode(">")
elif type == "EndTag":
name = token["name"]
@@ -273,13 +280,13 @@ class HTMLSerializer(object):
in_cdata = False
elif in_cdata:
self.serializeError(_("Unexpected child element of a CDATA element"))
yield self.encodeStrict(u"</%s>" % name)
yield self.encodeStrict("</%s>" % name)
elif type == "Comment":
data = token["data"]
if data.find("--") >= 0:
self.serializeError(_("Comment contains --"))
yield self.encodeStrict(u"<!--%s-->" % token["data"])
yield self.encodeStrict("<!--%s-->" % token["data"])
elif type == "Entity":
name = token["name"]
@@ -289,7 +296,7 @@ class HTMLSerializer(object):
if self.resolve_entities and key not in xmlEntities:
data = entities[key]
else:
data = u"&%s;" % name
data = "&%s;" % name
yield self.encodeStrict(data)
else:
@@ -297,9 +304,9 @@ class HTMLSerializer(object):
def render(self, treewalker, encoding=None):
if encoding:
return "".join(list(self.serialize(treewalker, encoding)))
return b"".join(list(self.serialize(treewalker, encoding)))
else:
return u"".join(list(self.serialize(treewalker)))
return "".join(list(self.serialize(treewalker)))
def serializeError(self, data="XXX ERROR MESSAGE NEEDED"):
# XXX The idea is to make data mandatory.
@@ -307,6 +314,7 @@ class HTMLSerializer(object):
if self.strict:
raise SerializeError
def SerializeError(Exception):
"""Error in serialized tree"""
pass

View File

@@ -1,9 +0,0 @@
from htmlserializer import HTMLSerializer
class XHTMLSerializer(HTMLSerializer):
quote_attr_values = True
minimize_boolean_attributes = False
use_trailing_solidus = True
escape_lt_in_attrs = True
omit_optional_tags = False
escape_rcdata = True

File diff suppressed because it is too large Load Diff

View File

View File

@@ -0,0 +1,44 @@
from __future__ import absolute_import, division, unicode_literals
from xml.sax.xmlreader import AttributesNSImpl
from ..constants import adjustForeignAttributes, unadjustForeignAttributes
prefix_mapping = {}
for prefix, localName, namespace in adjustForeignAttributes.values():
if prefix is not None:
prefix_mapping[prefix] = namespace
def to_sax(walker, handler):
"""Call SAX-like content handler based on treewalker walker"""
handler.startDocument()
for prefix, namespace in prefix_mapping.items():
handler.startPrefixMapping(prefix, namespace)
for token in walker:
type = token["type"]
if type == "Doctype":
continue
elif type in ("StartTag", "EmptyTag"):
attrs = AttributesNSImpl(token["data"],
unadjustForeignAttributes)
handler.startElementNS((token["namespace"], token["name"]),
token["name"],
attrs)
if type == "EmptyTag":
handler.endElementNS((token["namespace"], token["name"]),
token["name"])
elif type == "EndTag":
handler.endElementNS((token["namespace"], token["name"]),
token["name"])
elif type in ("Characters", "SpaceCharacters"):
handler.characters(token["data"])
elif type == "Comment":
pass
else:
assert False, "Unknown token type"
for prefix, namespace in prefix_mapping.items():
handler.endPrefixMapping(prefix)
handler.endDocument()

View File

@@ -7,7 +7,7 @@ implement several things:
1) A set of classes for various types of elements: Document, Doctype,
Comment, Element. These must implement the interface of
_base.treebuilders.Node (although comment nodes have a different
signature for their constructor, see treebuilders.simpletree.Comment)
signature for their constructor, see treebuilders.etree.Comment)
Textual content may also be implemented as another node type, or not, as
your tree implementation requires.
@@ -24,73 +24,53 @@ getDocument - Returns the root node of the complete document tree
testSerializer method on your treebuilder which accepts a node and
returns a string containing Node and its children serialized according
to the format used in the unittests
The supplied simpletree module provides a python-only implementation
of a full treebuilder and is a useful reference for the semantics of
the various methods.
"""
from __future__ import absolute_import, division, unicode_literals
from ..utils import default_etree
treeBuilderCache = {}
import sys
def getTreeBuilder(treeType, implementation=None, **kwargs):
"""Get a TreeBuilder class for various types of tree with built-in support
treeType - the name of the tree type required (case-insensitive). Supported
values are "simpletree", "dom", "etree" and "beautifulsoup"
"simpletree" - a built-in DOM-ish tree type with support for some
more pythonic idioms.
"dom" - A generic builder for DOM implementations, defaulting to
a xml.dom.minidom based implementation for the sake of
backwards compatibility (as releases up until 0.10 had a
builder called "dom" that was a minidom implemenation).
"etree" - A generic builder for tree implementations exposing an
elementtree-like interface (known to work with
ElementTree, cElementTree and lxml.etree).
"beautifulsoup" - Beautiful soup (if installed)
values are:
"dom" - A generic builder for DOM implementations, defaulting to
a xml.dom.minidom based implementation.
"etree" - A generic builder for tree implementations exposing an
ElementTree-like interface, defaulting to
xml.etree.cElementTree if available and
xml.etree.ElementTree if not.
"lxml" - A etree-based builder for lxml.etree, handling
limitations of lxml's implementation.
implementation - (Currently applies to the "etree" and "dom" tree types). A
module implementing the tree type e.g.
xml.etree.ElementTree or lxml.etree."""
xml.etree.ElementTree or xml.etree.cElementTree."""
treeType = treeType.lower()
if treeType not in treeBuilderCache:
if treeType == "dom":
import dom
# XXX: Keep backwards compatibility by using minidom if no implementation is given
if implementation == None:
from . import dom
# Come up with a sane default (pref. from the stdlib)
if implementation is None:
from xml.dom import minidom
implementation = minidom
# XXX: NEVER cache here, caching is done in the dom submodule
# NEVER cache here, caching is done in the dom submodule
return dom.getDomModule(implementation, **kwargs).TreeBuilder
elif treeType == "simpletree":
import simpletree
treeBuilderCache[treeType] = simpletree.TreeBuilder
elif treeType == "beautifulsoup":
import soup
treeBuilderCache[treeType] = soup.TreeBuilder
elif treeType == "lxml":
import etree_lxml
from . import etree_lxml
treeBuilderCache[treeType] = etree_lxml.TreeBuilder
elif treeType == "etree":
# Come up with a sane default
if implementation == None:
try:
import xml.etree.cElementTree as ET
except ImportError:
try:
import xml.etree.ElementTree as ET
except ImportError:
try:
import cElementTree as ET
except ImportError:
import elementtree.ElementTree as ET
implementation = ET
import etree
from . import etree
if implementation is None:
implementation = default_etree
# NEVER cache here, caching is done in the etree submodule
return etree.getETreeModule(implementation, **kwargs).TreeBuilder
else:
raise ValueError("""Unrecognised treebuilder "%s" """%treeType)
raise ValueError("""Unrecognised treebuilder "%s" """ % treeType)
return treeBuilderCache.get(treeType)

View File

@@ -1,25 +1,34 @@
from html5lib.constants import scopingElements, tableInsertModeElements, namespaces
try:
frozenset
except NameError:
# Import from the sets module for python 2.3
from sets import Set as set
from sets import ImmutableSet as frozenset
from __future__ import absolute_import, division, unicode_literals
from six import text_type
from ..constants import scopingElements, tableInsertModeElements, namespaces
# The scope markers are inserted when entering object elements,
# marquees, table cells, and table captions, and are used to prevent formatting
# from "leaking" into tables, object elements, and marquees.
Marker = None
listElementsMap = {
None: (frozenset(scopingElements), False),
"button": (frozenset(scopingElements | set([(namespaces["html"], "button")])), False),
"list": (frozenset(scopingElements | set([(namespaces["html"], "ol"),
(namespaces["html"], "ul")])), False),
"table": (frozenset([(namespaces["html"], "html"),
(namespaces["html"], "table")]), False),
"select": (frozenset([(namespaces["html"], "optgroup"),
(namespaces["html"], "option")]), True)
}
class Node(object):
def __init__(self, name):
"""Node representing an item in the tree.
name - The tag name associated with the node
parent - The parent of the current node (or None for the document node)
value - The value of the current node (applies to text nodes and
value - The value of the current node (applies to text nodes and
comments
attributes - a dict holding name, value pairs for attributes of the node
childNodes - a list of child nodes of the current node. This must
childNodes - a list of child nodes of the current node. This must
include all elements but not necessarily other node types
_flags - A list of miscellaneous flags that can be set on the node
"""
@@ -30,14 +39,14 @@ class Node(object):
self.childNodes = []
self._flags = []
def __unicode__(self):
attributesStr = " ".join(["%s=\"%s\""%(name, value)
for name, value in
self.attributes.iteritems()])
def __str__(self):
attributesStr = " ".join(["%s=\"%s\"" % (name, value)
for name, value in
self.attributes.items()])
if attributesStr:
return "<%s %s>"%(self.name,attributesStr)
return "<%s %s>" % (self.name, attributesStr)
else:
return "<%s>"%(self.name)
return "<%s>" % (self.name)
def __repr__(self):
return "<%s>" % (self.name)
@@ -48,14 +57,14 @@ class Node(object):
raise NotImplementedError
def insertText(self, data, insertBefore=None):
"""Insert data as text in the current node, positioned before the
"""Insert data as text in the current node, positioned before the
start of node insertBefore or to the end of the node's text.
"""
raise NotImplementedError
def insertBefore(self, node, refNode):
"""Insert node as a child of the current node, before refNode in the
list of child nodes. Raises ValueError if refNode is not a child of
"""Insert node as a child of the current node, before refNode in the
list of child nodes. Raises ValueError if refNode is not a child of
the current node"""
raise NotImplementedError
@@ -65,11 +74,11 @@ class Node(object):
raise NotImplementedError
def reparentChildren(self, newParent):
"""Move all the children of the current node to newParent.
This is needed so that trees that don't store text as nodes move the
"""Move all the children of the current node to newParent.
This is needed so that trees that don't store text as nodes move the
text in the correct way
"""
#XXX - should this method be made more general?
# XXX - should this method be made more general?
for child in self.childNodes:
newParent.appendChild(child)
self.childNodes = []
@@ -80,12 +89,12 @@ class Node(object):
"""
raise NotImplementedError
def hasContent(self):
"""Return true if the node has children or text, false otherwise
"""
raise NotImplementedError
class ActiveFormattingElements(list):
def append(self, node):
equalCount = 0
@@ -103,12 +112,13 @@ class ActiveFormattingElements(list):
def nodesEqual(self, node1, node2):
if not node1.nameTuple == node2.nameTuple:
return False
if not node1.attributes == node2.attributes:
return False
return True
class TreeBuilder(object):
"""Base treebuilder implementation
documentClass - the class to use for the bottommost node of a document
@@ -117,19 +127,19 @@ class TreeBuilder(object):
doctypeClass - the class to use for doctypes
"""
#Document class
# Document class
documentClass = None
#The class to use for creating a node
# The class to use for creating a node
elementClass = None
#The class to use for creating comments
# The class to use for creating comments
commentClass = None
#The class to use for creating doctypes
# The class to use for creating doctypes
doctypeClass = None
#Fragment class
# Fragment class
fragmentClass = None
def __init__(self, namespaceHTMLElements):
@@ -138,12 +148,12 @@ class TreeBuilder(object):
else:
self.defaultNamespace = None
self.reset()
def reset(self):
self.openElements = []
self.activeFormattingElements = ActiveFormattingElements()
#XXX - rename these to headElement, formElement
# XXX - rename these to headElement, formElement
self.headPointer = None
self.formPointer = None
@@ -153,30 +163,20 @@ class TreeBuilder(object):
def elementInScope(self, target, variant=None):
#If we pass a node in we match that. if we pass a string
#match any node with that name
# If we pass a node in we match that. if we pass a string
# match any node with that name
exactNode = hasattr(target, "nameTuple")
listElementsMap = {
None:(scopingElements, False),
"button":(scopingElements | set([(namespaces["html"], "button")]), False),
"list":(scopingElements | set([(namespaces["html"], "ol"),
(namespaces["html"], "ul")]), False),
"table":(set([(namespaces["html"], "html"),
(namespaces["html"], "table")]), False),
"select":(set([(namespaces["html"], "optgroup"),
(namespaces["html"], "option")]), True)
}
listElements, invert = listElementsMap[variant]
for node in reversed(self.openElements):
if (node.name == target and not exactNode or
node == target and exactNode):
node == target and exactNode):
return True
elif (invert ^ (node.nameTuple in listElements)):
elif (invert ^ (node.nameTuple in listElements)):
return False
assert False # We should never reach this point
assert False # We should never reach this point
def reconstructActiveFormattingElements(self):
# Within this algorithm the order of steps described in the
@@ -196,7 +196,7 @@ class TreeBuilder(object):
# Step 6
while entry != Marker and entry not in self.openElements:
if i == 0:
#This will be reset to 0 below
# This will be reset to 0 below
i = -1
break
i -= 1
@@ -209,13 +209,13 @@ class TreeBuilder(object):
# Step 8
entry = self.activeFormattingElements[i]
clone = entry.cloneNode() #Mainly to get a new copy of the attributes
clone = entry.cloneNode() # Mainly to get a new copy of the attributes
# Step 9
element = self.insertElement({"type":"StartTag",
"name":clone.name,
"namespace":clone.namespace,
"data":clone.attributes})
element = self.insertElement({"type": "StartTag",
"name": clone.name,
"namespace": clone.namespace,
"data": clone.attributes})
# Step 10
self.activeFormattingElements[i] = element
@@ -260,7 +260,7 @@ class TreeBuilder(object):
if parent is None:
parent = self.openElements[-1]
parent.appendChild(self.commentClass(token["data"]))
def createElement(self, token):
"""Create an element but don't insert it anywhere"""
name = token["name"]
@@ -282,10 +282,10 @@ class TreeBuilder(object):
self.insertElement = self.insertElementNormal
insertFromTable = property(_getInsertFromTable, _setInsertFromTable)
def insertElementNormal(self, token):
name = token["name"]
assert type(name) == unicode, "Element %s not unicode"%name
assert isinstance(name, text_type), "Element %s not unicode" % name
namespace = token.get("namespace", self.defaultNamespace)
element = self.elementClass(name, namespace)
element.attributes = token["data"]
@@ -294,13 +294,13 @@ class TreeBuilder(object):
return element
def insertElementTable(self, token):
"""Create an element and insert it into the tree"""
"""Create an element and insert it into the tree"""
element = self.createElement(token)
if self.openElements[-1].name not in tableInsertModeElements:
return self.insertElementNormal(token)
else:
#We should be in the InTable mode. This means we want to do
#special magic element rearranging
# We should be in the InTable mode. This means we want to do
# special magic element rearranging
parent, insertBefore = self.getTableMisnestedNodePosition()
if insertBefore is None:
parent.appendChild(element)
@@ -315,7 +315,7 @@ class TreeBuilder(object):
parent = self.openElements[-1]
if (not self.insertFromTable or (self.insertFromTable and
self.openElements[-1].name
self.openElements[-1].name
not in tableInsertModeElements)):
parent.insertText(data)
else:
@@ -323,14 +323,14 @@ class TreeBuilder(object):
# special magic element rearranging
parent, insertBefore = self.getTableMisnestedNodePosition()
parent.insertText(data, insertBefore)
def getTableMisnestedNodePosition(self):
"""Get the foster parent element, and sibling to insert before
(or None) when inserting a misnested table node"""
# The foster parent element is the one which comes before the most
# recently opened table element
# XXX - this is really inelegant
lastTable=None
lastTable = None
fosterParent = None
insertBefore = None
for elm in self.openElements[::-1]:
@@ -354,7 +354,7 @@ class TreeBuilder(object):
name = self.openElements[-1].name
# XXX td, th and tr are not actually needed
if (name in frozenset(("dd", "dt", "li", "option", "optgroup", "p", "rp", "rt"))
and name != exclude):
and name != exclude):
self.openElements.pop()
# XXX This is not entirely what the specification says. We should
# investigate it more closely.
@@ -363,10 +363,10 @@ class TreeBuilder(object):
def getDocument(self):
"Return the final tree"
return self.document
def getFragment(self):
"Return the final fragment"
#assert self.innerHTML
# assert self.innerHTML
fragment = self.fragmentClass()
self.openElements[0].reparentChildren(fragment)
return fragment

View File

@@ -1,45 +1,38 @@
from __future__ import absolute_import, division, unicode_literals
from xml.dom import minidom, Node, XML_NAMESPACE, XMLNS_NAMESPACE
try:
from types import ModuleType
except:
from new import module as ModuleType
import re
from xml.dom import minidom, Node
import weakref
import _base
from html5lib import constants, ihatexml
from html5lib.constants import namespaces
from . import _base
from .. import constants
from ..constants import namespaces
from ..utils import moduleFactoryFactory
moduleCache = {}
def getDomModule(DomImplementation):
name = "_" + DomImplementation.__name__+"builder"
if name in moduleCache:
return moduleCache[name]
else:
mod = ModuleType(name)
objs = getDomBuilder(DomImplementation)
mod.__dict__.update(objs)
moduleCache[name] = mod
return mod
def getDomBuilder(DomImplementation):
Dom = DomImplementation
class AttrList(object):
def __init__(self, element):
self.element = element
def __iter__(self):
return self.element.attributes.items().__iter__()
return list(self.element.attributes.items()).__iter__()
def __setitem__(self, name, value):
self.element.setAttribute(name, value)
def __len__(self):
return len(self.element.attributes.items())
return len(list(self.element.attributes.items()))
def items(self):
return [(item[0], item[1]) for item in
self.element.attributes.items()]
list(self.element.attributes.items())]
def keys(self):
return self.element.attributes.keys()
return list(self.element.attributes.keys())
def __getitem__(self, name):
return self.element.getAttribute(name)
@@ -48,68 +41,68 @@ def getDomBuilder(DomImplementation):
raise NotImplementedError
else:
return self.element.hasAttribute(name)
class NodeBuilder(_base.Node):
def __init__(self, element):
_base.Node.__init__(self, element.nodeName)
self.element = element
namespace = property(lambda self:hasattr(self.element, "namespaceURI")
namespace = property(lambda self: hasattr(self.element, "namespaceURI")
and self.element.namespaceURI or None)
def appendChild(self, node):
node.parent = self
self.element.appendChild(node.element)
def insertText(self, data, insertBefore=None):
text = self.element.ownerDocument.createTextNode(data)
if insertBefore:
self.element.insertBefore(text, insertBefore.element)
else:
self.element.appendChild(text)
def insertBefore(self, node, refNode):
self.element.insertBefore(node.element, refNode.element)
node.parent = self
def removeChild(self, node):
if node.element.parentNode == self.element:
self.element.removeChild(node.element)
node.parent = None
def reparentChildren(self, newParent):
while self.element.hasChildNodes():
child = self.element.firstChild
self.element.removeChild(child)
newParent.element.appendChild(child)
self.childNodes = []
def getAttributes(self):
return AttrList(self.element)
def setAttributes(self, attributes):
if attributes:
for name, value in attributes.items():
for name, value in list(attributes.items()):
if isinstance(name, tuple):
if name[0] is not None:
qualifiedName = (name[0] + ":" + name[1])
else:
qualifiedName = name[1]
self.element.setAttributeNS(name[2], qualifiedName,
self.element.setAttributeNS(name[2], qualifiedName,
value)
else:
self.element.setAttribute(
name, value)
attributes = property(getAttributes, setAttributes)
def cloneNode(self):
return NodeBuilder(self.element.cloneNode(False))
def hasContent(self):
return self.element.hasChildNodes()
def getNameTuple(self):
if self.namespace == None:
if self.namespace is None:
return namespaces["html"], self.name
else:
return self.namespace, self.name
@@ -118,9 +111,9 @@ def getDomBuilder(DomImplementation):
class TreeBuilder(_base.TreeBuilder):
def documentClass(self):
self.dom = Dom.getDOMImplementation().createDocument(None,None,None)
self.dom = Dom.getDOMImplementation().createDocument(None, None, None)
return weakref.proxy(self)
def insertDoctype(self, token):
name = token["name"]
publicId = token["publicId"]
@@ -131,7 +124,7 @@ def getDomBuilder(DomImplementation):
self.document.appendChild(NodeBuilder(doctype))
if Dom == minidom:
doctype.ownerDocument = self.dom
def elementClass(self, name, namespace=None):
if namespace is None and self.defaultNamespace is None:
node = self.dom.createElement(name)
@@ -139,70 +132,72 @@ def getDomBuilder(DomImplementation):
node = self.dom.createElementNS(namespace, name)
return NodeBuilder(node)
def commentClass(self, data):
return NodeBuilder(self.dom.createComment(data))
def fragmentClass(self):
return NodeBuilder(self.dom.createDocumentFragment())
def appendChild(self, node):
self.dom.appendChild(node.element)
def testSerializer(self, element):
return testSerializer(element)
def getDocument(self):
return self.dom
def getFragment(self):
return _base.TreeBuilder.getFragment(self).element
def insertText(self, data, parent=None):
data=data
if parent <> self:
data = data
if parent != self:
_base.TreeBuilder.insertText(self, data, parent)
else:
# HACK: allow text nodes as children of the document node
if hasattr(self.dom, '_child_node_types'):
if not Node.TEXT_NODE in self.dom._child_node_types:
self.dom._child_node_types=list(self.dom._child_node_types)
self.dom._child_node_types = list(self.dom._child_node_types)
self.dom._child_node_types.append(Node.TEXT_NODE)
self.dom.appendChild(self.dom.createTextNode(data))
implementation = DomImplementation
name = None
def testSerializer(element):
element.normalize()
rv = []
def serializeElement(element, indent=0):
if element.nodeType == Node.DOCUMENT_TYPE_NODE:
if element.name:
if element.publicId or element.systemId:
publicId = element.publicId or ""
systemId = element.systemId or ""
rv.append( """|%s<!DOCTYPE %s "%s" "%s">"""%(
' '*indent, element.name, publicId, systemId))
rv.append("""|%s<!DOCTYPE %s "%s" "%s">""" %
(' ' * indent, element.name, publicId, systemId))
else:
rv.append("|%s<!DOCTYPE %s>"%(' '*indent, element.name))
rv.append("|%s<!DOCTYPE %s>" % (' ' * indent, element.name))
else:
rv.append("|%s<!DOCTYPE >"%(' '*indent,))
rv.append("|%s<!DOCTYPE >" % (' ' * indent,))
elif element.nodeType == Node.DOCUMENT_NODE:
rv.append("#document")
elif element.nodeType == Node.DOCUMENT_FRAGMENT_NODE:
rv.append("#document-fragment")
elif element.nodeType == Node.COMMENT_NODE:
rv.append("|%s<!-- %s -->"%(' '*indent, element.nodeValue))
rv.append("|%s<!-- %s -->" % (' ' * indent, element.nodeValue))
elif element.nodeType == Node.TEXT_NODE:
rv.append("|%s\"%s\"" %(' '*indent, element.nodeValue))
rv.append("|%s\"%s\"" % (' ' * indent, element.nodeValue))
else:
if (hasattr(element, "namespaceURI") and
element.namespaceURI != None):
name = "%s %s"%(constants.prefixes[element.namespaceURI],
element.nodeName)
element.namespaceURI is not None):
name = "%s %s" % (constants.prefixes[element.namespaceURI],
element.nodeName)
else:
name = element.nodeName
rv.append("|%s<%s>"%(' '*indent, name))
rv.append("|%s<%s>" % (' ' * indent, name))
if element.hasAttributes():
attributes = []
for i in range(len(element.attributes)):
@@ -211,81 +206,22 @@ def getDomBuilder(DomImplementation):
value = attr.value
ns = attr.namespaceURI
if ns:
name = "%s %s"%(constants.prefixes[ns], attr.localName)
name = "%s %s" % (constants.prefixes[ns], attr.localName)
else:
name = attr.nodeName
attributes.append((name, value))
for name, value in sorted(attributes):
rv.append('|%s%s="%s"' % (' '*(indent+2), name, value))
rv.append('|%s%s="%s"' % (' ' * (indent + 2), name, value))
indent += 2
for child in element.childNodes:
serializeElement(child, indent)
serializeElement(element, 0)
return "\n".join(rv)
def dom2sax(node, handler, nsmap={'xml':XML_NAMESPACE}):
if node.nodeType == Node.ELEMENT_NODE:
if not nsmap:
handler.startElement(node.nodeName, node.attributes)
for child in node.childNodes: dom2sax(child, handler, nsmap)
handler.endElement(node.nodeName)
else:
attributes = dict(node.attributes.itemsNS())
# gather namespace declarations
prefixes = []
for attrname in node.attributes.keys():
attr = node.getAttributeNode(attrname)
if (attr.namespaceURI == XMLNS_NAMESPACE or
(attr.namespaceURI == None and attr.nodeName.startswith('xmlns'))):
prefix = (attr.nodeName != 'xmlns' and attr.nodeName or None)
handler.startPrefixMapping(prefix, attr.nodeValue)
prefixes.append(prefix)
nsmap = nsmap.copy()
nsmap[prefix] = attr.nodeValue
del attributes[(attr.namespaceURI, attr.nodeName)]
# apply namespace declarations
for attrname in node.attributes.keys():
attr = node.getAttributeNode(attrname)
if attr.namespaceURI == None and ':' in attr.nodeName:
prefix = attr.nodeName.split(':')[0]
if nsmap.has_key(prefix):
del attributes[(attr.namespaceURI, attr.nodeName)]
attributes[(nsmap[prefix],attr.nodeName)]=attr.nodeValue
# SAX events
ns = node.namespaceURI or nsmap.get(None,None)
handler.startElementNS((ns,node.nodeName), node.nodeName, attributes)
for child in node.childNodes: dom2sax(child, handler, nsmap)
handler.endElementNS((ns, node.nodeName), node.nodeName)
for prefix in prefixes: handler.endPrefixMapping(prefix)
elif node.nodeType in [Node.TEXT_NODE, Node.CDATA_SECTION_NODE]:
handler.characters(node.nodeValue)
elif node.nodeType == Node.DOCUMENT_NODE:
handler.startDocument()
for child in node.childNodes: dom2sax(child, handler, nsmap)
handler.endDocument()
elif node.nodeType == Node.DOCUMENT_FRAGMENT_NODE:
for child in node.childNodes: dom2sax(child, handler, nsmap)
else:
# ATTRIBUTE_NODE
# ENTITY_NODE
# PROCESSING_INSTRUCTION_NODE
# COMMENT_NODE
# DOCUMENT_TYPE_NODE
# NOTATION_NODE
pass
return locals()
# Keep backwards compatibility with things that directly load
# classes/functions from this module
for key, value in getDomModule(minidom).__dict__.items():
globals()[key] = value
# The actual means to get a module!
getDomModule = moduleFactoryFactory(getDomBuilder)

View File

@@ -1,32 +1,21 @@
try:
from types import ModuleType
except:
from new import module as ModuleType
import re
import types
from __future__ import absolute_import, division, unicode_literals
from six import text_type
import _base
from html5lib import ihatexml
from html5lib import constants
from html5lib.constants import namespaces
import re
from . import _base
from .. import ihatexml
from .. import constants
from ..constants import namespaces
from ..utils import moduleFactoryFactory
tag_regexp = re.compile("{([^}]*)}(.*)")
moduleCache = {}
def getETreeModule(ElementTreeImplementation, fullTree=False):
name = "_" + ElementTreeImplementation.__name__+"builder"
if name in moduleCache:
return moduleCache[name]
else:
mod = ModuleType("_" + ElementTreeImplementation.__name__+"builder")
objs = getETreeBuilder(ElementTreeImplementation, fullTree)
mod.__dict__.update(objs)
moduleCache[name] = mod
return mod
def getETreeBuilder(ElementTreeImplementation, fullTree=False):
ElementTree = ElementTreeImplementation
ElementTreeCommentType = ElementTree.Comment("asd").tag
class Element(_base.Node):
def __init__(self, name, namespace=None):
self._name = name
@@ -45,16 +34,16 @@ def getETreeBuilder(ElementTreeImplementation, fullTree=False):
if namespace is None:
etree_tag = name
else:
etree_tag = "{%s}%s"%(namespace, name)
etree_tag = "{%s}%s" % (namespace, name)
return etree_tag
def _setName(self, name):
self._name = name
self._element.tag = self._getETreeTag(self._name, self._namespace)
def _getName(self):
return self._name
name = property(_getName, _setName)
def _setNamespace(self, namespace):
@@ -65,81 +54,82 @@ def getETreeBuilder(ElementTreeImplementation, fullTree=False):
return self._namespace
namespace = property(_getNamespace, _setNamespace)
def _getAttributes(self):
return self._element.attrib
def _setAttributes(self, attributes):
#Delete existing attributes first
#XXX - there may be a better way to do this...
for key in self._element.attrib.keys():
# Delete existing attributes first
# XXX - there may be a better way to do this...
for key in list(self._element.attrib.keys()):
del self._element.attrib[key]
for key, value in attributes.iteritems():
for key, value in attributes.items():
if isinstance(key, tuple):
name = "{%s}%s"%(key[2], key[1])
name = "{%s}%s" % (key[2], key[1])
else:
name = key
self._element.set(name, value)
attributes = property(_getAttributes, _setAttributes)
def _getChildNodes(self):
return self._childNodes
return self._childNodes
def _setChildNodes(self, value):
del self._element[:]
self._childNodes = []
for element in value:
self.insertChild(element)
childNodes = property(_getChildNodes, _setChildNodes)
def hasContent(self):
"""Return true if the node has children or text"""
return bool(self._element.text or len(self._element))
def appendChild(self, node):
self._childNodes.append(node)
self._element.append(node._element)
node.parent = self
def insertBefore(self, node, refNode):
index = list(self._element).index(refNode._element)
self._element.insert(index, node._element)
node.parent = self
def removeChild(self, node):
self._element.remove(node._element)
node.parent=None
node.parent = None
def insertText(self, data, insertBefore=None):
if not(len(self._element)):
if not self._element.text:
self._element.text = ""
self._element.text += data
elif insertBefore is None:
#Insert the text as the tail of the last child element
# Insert the text as the tail of the last child element
if not self._element[-1].tail:
self._element[-1].tail = ""
self._element[-1].tail += data
else:
#Insert the text before the specified node
# Insert the text before the specified node
children = list(self._element)
index = children.index(insertBefore._element)
if index > 0:
if not self._element[index-1].tail:
self._element[index-1].tail = ""
self._element[index-1].tail += data
if not self._element[index - 1].tail:
self._element[index - 1].tail = ""
self._element[index - 1].tail += data
else:
if not self._element.text:
self._element.text = ""
self._element.text += data
def cloneNode(self):
element = type(self)(self.name, self.namespace)
for name, value in self.attributes.iteritems():
for name, value in self.attributes.items():
element.attributes[name] = value
return element
def reparentChildren(self, newParent):
if newParent.childNodes:
newParent.childNodes[-1]._element.tail += self._element.text
@@ -150,60 +140,60 @@ def getETreeBuilder(ElementTreeImplementation, fullTree=False):
newParent._element.text += self._element.text
self._element.text = ""
_base.Node.reparentChildren(self, newParent)
class Comment(Element):
def __init__(self, data):
#Use the superclass constructor to set all properties on the
#wrapper element
# Use the superclass constructor to set all properties on the
# wrapper element
self._element = ElementTree.Comment(data)
self.parent = None
self._childNodes = []
self._flags = []
def _getData(self):
return self._element.text
def _setData(self, value):
self._element.text = value
data = property(_getData, _setData)
class DocumentType(Element):
def __init__(self, name, publicId, systemId):
Element.__init__(self, "<!DOCTYPE>")
Element.__init__(self, "<!DOCTYPE>")
self._element.text = name
self.publicId = publicId
self.systemId = systemId
def _getPublicId(self):
return self._element.get(u"publicId", "")
return self._element.get("publicId", "")
def _setPublicId(self, value):
if value is not None:
self._element.set(u"publicId", value)
self._element.set("publicId", value)
publicId = property(_getPublicId, _setPublicId)
def _getSystemId(self):
return self._element.get(u"systemId", "")
return self._element.get("systemId", "")
def _setSystemId(self, value):
if value is not None:
self._element.set(u"systemId", value)
self._element.set("systemId", value)
systemId = property(_getSystemId, _setSystemId)
class Document(Element):
def __init__(self):
Element.__init__(self, "<DOCUMENT_ROOT>")
Element.__init__(self, "DOCUMENT_ROOT")
class DocumentFragment(Element):
def __init__(self):
Element.__init__(self, "<DOCUMENT_FRAGMENT>")
Element.__init__(self, "DOCUMENT_FRAGMENT")
def testSerializer(element):
rv = []
finalText = None
def serializeElement(element, indent=0):
if not(hasattr(element, "tag")):
element = element.getroot()
@@ -211,20 +201,23 @@ def getETreeBuilder(ElementTreeImplementation, fullTree=False):
if element.get("publicId") or element.get("systemId"):
publicId = element.get("publicId") or ""
systemId = element.get("systemId") or ""
rv.append( """<!DOCTYPE %s "%s" "%s">"""%(
element.text, publicId, systemId))
else:
rv.append("<!DOCTYPE %s>"%(element.text,))
elif element.tag == "<DOCUMENT_ROOT>":
rv.append("""<!DOCTYPE %s "%s" "%s">""" %
(element.text, publicId, systemId))
else:
rv.append("<!DOCTYPE %s>" % (element.text,))
elif element.tag == "DOCUMENT_ROOT":
rv.append("#document")
if element.text:
rv.append("|%s\"%s\""%(' '*(indent+2), element.text))
if element.tail:
finalText = element.tail
elif element.tag == ElementTree.Comment:
rv.append("|%s<!-- %s -->"%(' '*indent, element.text))
if element.text is not None:
rv.append("|%s\"%s\"" % (' ' * (indent + 2), element.text))
if element.tail is not None:
raise TypeError("Document node cannot have tail")
if hasattr(element, "attrib") and len(element.attrib):
raise TypeError("Document node cannot have attributes")
elif element.tag == ElementTreeCommentType:
rv.append("|%s<!-- %s -->" % (' ' * indent, element.text))
else:
assert type(element.tag) in types.StringTypes, "Expected unicode, got %s"%type(element.tag)
assert isinstance(element.tag, text_type), \
"Expected unicode, got %s, %s" % (type(element.tag), element.tag)
nsmatch = tag_regexp.match(element.tag)
if nsmatch is None:
@@ -232,113 +225,113 @@ def getETreeBuilder(ElementTreeImplementation, fullTree=False):
else:
ns, name = nsmatch.groups()
prefix = constants.prefixes[ns]
name = "%s %s"%(prefix, name)
rv.append("|%s<%s>"%(' '*indent, name))
name = "%s %s" % (prefix, name)
rv.append("|%s<%s>" % (' ' * indent, name))
if hasattr(element, "attrib"):
attributes = []
for name, value in element.attrib.iteritems():
for name, value in element.attrib.items():
nsmatch = tag_regexp.match(name)
if nsmatch is not None:
ns, name = nsmatch.groups()
prefix = constants.prefixes[ns]
attr_string = "%s %s"%(prefix, name)
attr_string = "%s %s" % (prefix, name)
else:
attr_string = name
attributes.append((attr_string, value))
for name, value in sorted(attributes):
rv.append('|%s%s="%s"' % (' '*(indent+2), name, value))
rv.append('|%s%s="%s"' % (' ' * (indent + 2), name, value))
if element.text:
rv.append("|%s\"%s\"" %(' '*(indent+2), element.text))
rv.append("|%s\"%s\"" % (' ' * (indent + 2), element.text))
indent += 2
for child in element:
serializeElement(child, indent)
if element.tail:
rv.append("|%s\"%s\"" %(' '*(indent-2), element.tail))
rv.append("|%s\"%s\"" % (' ' * (indent - 2), element.tail))
serializeElement(element, 0)
if finalText is not None:
rv.append("|%s\"%s\""%(' '*2, finalText))
return "\n".join(rv)
def tostring(element):
"""Serialize an element and its child nodes to a string"""
rv = []
finalText = None
filter = ihatexml.InfosetFilter()
def serializeElement(element):
if type(element) == type(ElementTree.ElementTree):
if isinstance(element, ElementTree.ElementTree):
element = element.getroot()
if element.tag == "<!DOCTYPE>":
if element.get("publicId") or element.get("systemId"):
publicId = element.get("publicId") or ""
systemId = element.get("systemId") or ""
rv.append( """<!DOCTYPE %s PUBLIC "%s" "%s">"""%(
element.text, publicId, systemId))
else:
rv.append("<!DOCTYPE %s>"%(element.text,))
elif element.tag == "<DOCUMENT_ROOT>":
if element.text:
rv.append(element.text)
if element.tail:
finalText = element.tail
for child in element:
serializeElement(child)
elif type(element.tag) == type(ElementTree.Comment):
rv.append("<!--%s-->"%(element.text,))
else:
#This is assumed to be an ordinary element
if not element.attrib:
rv.append("<%s>"%(filter.fromXmlName(element.tag),))
rv.append("""<!DOCTYPE %s PUBLIC "%s" "%s">""" %
(element.text, publicId, systemId))
else:
attr = " ".join(["%s=\"%s\""%(
filter.fromXmlName(name), value)
for name, value in element.attrib.iteritems()])
rv.append("<%s %s>"%(element.tag, attr))
if element.text:
rv.append("<!DOCTYPE %s>" % (element.text,))
elif element.tag == "DOCUMENT_ROOT":
if element.text is not None:
rv.append(element.text)
if element.tail is not None:
raise TypeError("Document node cannot have tail")
if hasattr(element, "attrib") and len(element.attrib):
raise TypeError("Document node cannot have attributes")
for child in element:
serializeElement(child)
rv.append("</%s>"%(element.tag,))
elif element.tag == ElementTreeCommentType:
rv.append("<!--%s-->" % (element.text,))
else:
# This is assumed to be an ordinary element
if not element.attrib:
rv.append("<%s>" % (filter.fromXmlName(element.tag),))
else:
attr = " ".join(["%s=\"%s\"" % (
filter.fromXmlName(name), value)
for name, value in element.attrib.items()])
rv.append("<%s %s>" % (element.tag, attr))
if element.text:
rv.append(element.text)
for child in element:
serializeElement(child)
rv.append("</%s>" % (element.tag,))
if element.tail:
rv.append(element.tail)
serializeElement(element)
if finalText is not None:
rv.append("%s\""%(' '*2, finalText))
return "".join(rv)
class TreeBuilder(_base.TreeBuilder):
documentClass = Document
doctypeClass = DocumentType
elementClass = Element
commentClass = Comment
fragmentClass = DocumentFragment
implementation = ElementTreeImplementation
def testSerializer(self, element):
return testSerializer(element)
def getDocument(self):
if fullTree:
return self.document._element
else:
if self.defaultNamespace is not None:
return self.document._element.find(
"{%s}html"%self.defaultNamespace)
"{%s}html" % self.defaultNamespace)
else:
return self.document._element.find("html")
def getFragment(self):
return _base.TreeBuilder.getFragment(self)._element
return locals()
getETreeModule = moduleFactoryFactory(getETreeBuilder)

View File

@@ -1,20 +1,3 @@
import warnings
import re
import _base
from html5lib.constants import DataLossWarning
import html5lib.constants as constants
import etree as etree_builders
from html5lib import ihatexml
try:
import lxml.etree as etree
except ImportError:
pass
fullTree = True
tag_regexp = re.compile("{([^}]*)}(.*)")
"""Module for supporting the lxml.etree library. The idea here is to use as much
of the native library as possible, without using fragile hacks like custom element
names that break between releases. The downside of this is that we cannot represent
@@ -26,12 +9,34 @@ Docypes with no name
When any of these things occur, we emit a DataLossWarning
"""
from __future__ import absolute_import, division, unicode_literals
import warnings
import re
import sys
from . import _base
from ..constants import DataLossWarning
from .. import constants
from . import etree as etree_builders
from .. import ihatexml
import lxml.etree as etree
fullTree = True
tag_regexp = re.compile("{([^}]*)}(.*)")
comment_type = etree.Comment("asd").tag
class DocumentType(object):
def __init__(self, name, publicId, systemId):
self.name = name
self.name = name
self.publicId = publicId
self.systemId = systemId
class Document(object):
def __init__(self):
self._elementTree = None
@@ -42,118 +47,126 @@ class Document(object):
def _getChildNodes(self):
return self._childNodes
childNodes = property(_getChildNodes)
def testSerializer(element):
rv = []
finalText = None
filter = ihatexml.InfosetFilter()
infosetFilter = ihatexml.InfosetFilter()
def serializeElement(element, indent=0):
if not hasattr(element, "tag"):
if hasattr(element, "getroot"):
#Full tree case
if hasattr(element, "getroot"):
# Full tree case
rv.append("#document")
if element.docinfo.internalDTD:
if not (element.docinfo.public_id or
if not (element.docinfo.public_id or
element.docinfo.system_url):
dtd_str = "<!DOCTYPE %s>"%element.docinfo.root_name
dtd_str = "<!DOCTYPE %s>" % element.docinfo.root_name
else:
dtd_str = """<!DOCTYPE %s "%s" "%s">"""%(
element.docinfo.root_name,
dtd_str = """<!DOCTYPE %s "%s" "%s">""" % (
element.docinfo.root_name,
element.docinfo.public_id,
element.docinfo.system_url)
rv.append("|%s%s"%(' '*(indent+2), dtd_str))
rv.append("|%s%s" % (' ' * (indent + 2), dtd_str))
next_element = element.getroot()
while next_element.getprevious() is not None:
next_element = next_element.getprevious()
while next_element is not None:
serializeElement(next_element, indent+2)
serializeElement(next_element, indent + 2)
next_element = next_element.getnext()
elif isinstance(element, basestring):
#Text in a fragment
rv.append("|%s\"%s\""%(' '*indent, element))
elif isinstance(element, str) or isinstance(element, bytes):
# Text in a fragment
assert isinstance(element, str) or sys.version_info.major == 2
rv.append("|%s\"%s\"" % (' ' * indent, element))
else:
#Fragment case
# Fragment case
rv.append("#document-fragment")
for next_element in element:
serializeElement(next_element, indent+2)
elif type(element.tag) == type(etree.Comment):
rv.append("|%s<!-- %s -->"%(' '*indent, element.text))
serializeElement(next_element, indent + 2)
elif element.tag == comment_type:
rv.append("|%s<!-- %s -->" % (' ' * indent, element.text))
if hasattr(element, "tail") and element.tail:
rv.append("|%s\"%s\"" % (' ' * indent, element.tail))
else:
assert isinstance(element, etree._Element)
nsmatch = etree_builders.tag_regexp.match(element.tag)
if nsmatch is not None:
ns = nsmatch.group(1)
tag = nsmatch.group(2)
prefix = constants.prefixes[ns]
rv.append("|%s<%s %s>"%(' '*indent, prefix,
filter.fromXmlName(tag)))
rv.append("|%s<%s %s>" % (' ' * indent, prefix,
infosetFilter.fromXmlName(tag)))
else:
rv.append("|%s<%s>"%(' '*indent,
filter.fromXmlName(element.tag)))
rv.append("|%s<%s>" % (' ' * indent,
infosetFilter.fromXmlName(element.tag)))
if hasattr(element, "attrib"):
attributes = []
for name, value in element.attrib.iteritems():
for name, value in element.attrib.items():
nsmatch = tag_regexp.match(name)
if nsmatch is not None:
ns, name = nsmatch.groups()
name = filter.fromXmlName(name)
name = infosetFilter.fromXmlName(name)
prefix = constants.prefixes[ns]
attr_string = "%s %s"%(prefix, name)
attr_string = "%s %s" % (prefix, name)
else:
attr_string = filter.fromXmlName(name)
attr_string = infosetFilter.fromXmlName(name)
attributes.append((attr_string, value))
for name, value in sorted(attributes):
rv.append('|%s%s="%s"' % (' '*(indent+2), name, value))
rv.append('|%s%s="%s"' % (' ' * (indent + 2), name, value))
if element.text:
rv.append("|%s\"%s\"" %(' '*(indent+2), element.text))
rv.append("|%s\"%s\"" % (' ' * (indent + 2), element.text))
indent += 2
for child in element.getchildren():
for child in element:
serializeElement(child, indent)
if hasattr(element, "tail") and element.tail:
rv.append("|%s\"%s\"" %(' '*(indent-2), element.tail))
if hasattr(element, "tail") and element.tail:
rv.append("|%s\"%s\"" % (' ' * (indent - 2), element.tail))
serializeElement(element, 0)
if finalText is not None:
rv.append("|%s\"%s\""%(' '*2, finalText))
rv.append("|%s\"%s\"" % (' ' * 2, finalText))
return "\n".join(rv)
def tostring(element):
"""Serialize an element and its child nodes to a string"""
rv = []
finalText = None
def serializeElement(element):
if not hasattr(element, "tag"):
if element.docinfo.internalDTD:
if element.docinfo.doctype:
dtd_str = element.docinfo.doctype
else:
dtd_str = "<!DOCTYPE %s>"%element.docinfo.root_name
dtd_str = "<!DOCTYPE %s>" % element.docinfo.root_name
rv.append(dtd_str)
serializeElement(element.getroot())
elif type(element.tag) == type(etree.Comment):
rv.append("<!--%s-->"%(element.text,))
elif element.tag == comment_type:
rv.append("<!--%s-->" % (element.text,))
else:
#This is assumed to be an ordinary element
# This is assumed to be an ordinary element
if not element.attrib:
rv.append("<%s>"%(element.tag,))
rv.append("<%s>" % (element.tag,))
else:
attr = " ".join(["%s=\"%s\""%(name, value)
for name, value in element.attrib.iteritems()])
rv.append("<%s %s>"%(element.tag, attr))
attr = " ".join(["%s=\"%s\"" % (name, value)
for name, value in element.attrib.items()])
rv.append("<%s %s>" % (element.tag, attr))
if element.text:
rv.append(element.text)
for child in element.getchildren():
for child in element:
serializeElement(child)
rv.append("</%s>"%(element.tag,))
rv.append("</%s>" % (element.tag,))
if hasattr(element, "tail") and element.tail:
rv.append(element.tail)
@@ -161,56 +174,57 @@ def tostring(element):
serializeElement(element)
if finalText is not None:
rv.append("%s\""%(' '*2, finalText))
rv.append("%s\"" % (' ' * 2, finalText))
return "".join(rv)
class TreeBuilder(_base.TreeBuilder):
documentClass = Document
doctypeClass = DocumentType
elementClass = None
commentClass = None
fragmentClass = Document
fragmentClass = Document
implementation = etree
def __init__(self, namespaceHTMLElements, fullTree = False):
def __init__(self, namespaceHTMLElements, fullTree=False):
builder = etree_builders.getETreeModule(etree, fullTree=fullTree)
filter = self.filter = ihatexml.InfosetFilter()
infosetFilter = self.infosetFilter = ihatexml.InfosetFilter()
self.namespaceHTMLElements = namespaceHTMLElements
class Attributes(dict):
def __init__(self, element, value={}):
self._element = element
dict.__init__(self, value)
for key, value in self.iteritems():
for key, value in self.items():
if isinstance(key, tuple):
name = "{%s}%s"%(key[2], filter.coerceAttribute(key[1]))
name = "{%s}%s" % (key[2], infosetFilter.coerceAttribute(key[1]))
else:
name = filter.coerceAttribute(key)
name = infosetFilter.coerceAttribute(key)
self._element._element.attrib[name] = value
def __setitem__(self, key, value):
dict.__setitem__(self, key, value)
if isinstance(key, tuple):
name = "{%s}%s"%(key[2], filter.coerceAttribute(key[1]))
name = "{%s}%s" % (key[2], infosetFilter.coerceAttribute(key[1]))
else:
name = filter.coerceAttribute(key)
name = infosetFilter.coerceAttribute(key)
self._element._element.attrib[name] = value
class Element(builder.Element):
def __init__(self, name, namespace):
name = filter.coerceElement(name)
name = infosetFilter.coerceElement(name)
builder.Element.__init__(self, name, namespace=namespace)
self._attributes = Attributes(self)
def _setName(self, name):
self._name = filter.coerceElement(name)
self._name = infosetFilter.coerceElement(name)
self._element.tag = self._getETreeTag(
self._name, self._namespace)
def _getName(self):
return filter.fromXmlName(self._name)
return infosetFilter.fromXmlName(self._name)
name = property(_getName, _setName)
def _getAttributes(self):
@@ -218,24 +232,23 @@ class TreeBuilder(_base.TreeBuilder):
def _setAttributes(self, attributes):
self._attributes = Attributes(self, attributes)
attributes = property(_getAttributes, _setAttributes)
def insertText(self, data, insertBefore=None):
data = filter.coerceCharacters(data)
data = infosetFilter.coerceCharacters(data)
builder.Element.insertText(self, data, insertBefore)
def appendChild(self, child):
builder.Element.appendChild(self, child)
class Comment(builder.Comment):
def __init__(self, data):
data = filter.coerceComment(data)
data = infosetFilter.coerceComment(data)
builder.Comment.__init__(self, data)
def _setData(self, data):
data = filter.coerceComment(data)
data = infosetFilter.coerceComment(data)
self._element.text = data
def _getData(self):
@@ -245,9 +258,9 @@ class TreeBuilder(_base.TreeBuilder):
self.elementClass = Element
self.commentClass = builder.Comment
#self.fragmentClass = builder.DocumentFragment
# self.fragmentClass = builder.DocumentFragment
_base.TreeBuilder.__init__(self, namespaceHTMLElements)
def reset(self):
_base.TreeBuilder.reset(self)
self.insertComment = self.insertCommentInitial
@@ -262,13 +275,13 @@ class TreeBuilder(_base.TreeBuilder):
return self.document._elementTree
else:
return self.document._elementTree.getroot()
def getFragment(self):
fragment = []
element = self.openElements[0]._element
if element.text:
fragment.append(element.text)
fragment.extend(element.getchildren())
fragment.extend(list(element))
if element.tail:
fragment.append(element.tail)
return fragment
@@ -278,59 +291,79 @@ class TreeBuilder(_base.TreeBuilder):
publicId = token["publicId"]
systemId = token["systemId"]
if not name or ihatexml.nonXmlNameBMPRegexp.search(name) or name[0] == '"':
warnings.warn("lxml cannot represent null or non-xml doctype", DataLossWarning)
if not name:
warnings.warn("lxml cannot represent empty doctype", DataLossWarning)
self.doctype = None
else:
coercedName = self.infosetFilter.coerceElement(name)
if coercedName != name:
warnings.warn("lxml cannot represent non-xml doctype", DataLossWarning)
doctype = self.doctypeClass(coercedName, publicId, systemId)
self.doctype = doctype
doctype = self.doctypeClass(name, publicId, systemId)
self.doctype = doctype
def insertCommentInitial(self, data, parent=None):
self.initial_comments.append(data)
def insertCommentMain(self, data, parent=None):
if (parent == self.document and
self.document._elementTree.getroot()[-1].tag == comment_type):
warnings.warn("lxml cannot represent adjacent comments beyond the root elements", DataLossWarning)
super(TreeBuilder, self).insertComment(data, parent)
def insertRoot(self, token):
"""Create the document root"""
#Because of the way libxml2 works, it doesn't seem to be possible to
#alter information like the doctype after the tree has been parsed.
#Therefore we need to use the built-in parser to create our iniial
#tree, after which we can add elements like normal
# Because of the way libxml2 works, it doesn't seem to be possible to
# alter information like the doctype after the tree has been parsed.
# Therefore we need to use the built-in parser to create our iniial
# tree, after which we can add elements like normal
docStr = ""
if self.doctype and self.doctype.name and not self.doctype.name.startswith('"'):
docStr += "<!DOCTYPE %s"%self.doctype.name
if (self.doctype.publicId is not None or
self.doctype.systemId is not None):
docStr += ' PUBLIC "%s" "%s"'%(self.doctype.publicId or "",
self.doctype.systemId or "")
if self.doctype:
assert self.doctype.name
docStr += "<!DOCTYPE %s" % self.doctype.name
if (self.doctype.publicId is not None or
self.doctype.systemId is not None):
docStr += (' PUBLIC "%s" ' %
(self.infosetFilter.coercePubid(self.doctype.publicId or "")))
if self.doctype.systemId:
sysid = self.doctype.systemId
if sysid.find("'") >= 0 and sysid.find('"') >= 0:
warnings.warn("DOCTYPE system cannot contain single and double quotes", DataLossWarning)
sysid = sysid.replace("'", 'U00027')
if sysid.find("'") >= 0:
docStr += '"%s"' % sysid
else:
docStr += "'%s'" % sysid
else:
docStr += "''"
docStr += ">"
if self.doctype.name != token["name"]:
warnings.warn("lxml cannot represent doctype with a different name to the root element", DataLossWarning)
docStr += "<THIS_SHOULD_NEVER_APPEAR_PUBLICLY/>"
try:
root = etree.fromstring(docStr)
except etree.XMLSyntaxError:
print docStr
raise
#Append the initial comments:
root = etree.fromstring(docStr)
# Append the initial comments:
for comment_token in self.initial_comments:
root.addprevious(etree.Comment(comment_token["data"]))
#Create the root document and add the ElementTree to it
# Create the root document and add the ElementTree to it
self.document = self.documentClass()
self.document._elementTree = root.getroottree()
# Give the root element the right name
name = token["name"]
namespace = token.get("namespace", self.defaultNamespace)
if namespace is None:
etree_tag = name
else:
etree_tag = "{%s}%s"%(namespace, name)
etree_tag = "{%s}%s" % (namespace, name)
root.tag = etree_tag
#Add the root element to the internal child/open data structures
# Add the root element to the internal child/open data structures
root_element = self.elementClass(name, namespace)
root_element._element = root
self.document._childNodes.append(root_element)
self.openElements.append(root_element)
#Reset to the default insert comment function
self.insertComment = super(TreeBuilder, self).insertComment
# Reset to the default insert comment function
self.insertComment = self.insertCommentMain

View File

@@ -1,256 +0,0 @@
import _base
from html5lib.constants import voidElements, namespaces, prefixes
from xml.sax.saxutils import escape
# Really crappy basic implementation of a DOM-core like thing
class Node(_base.Node):
type = -1
def __init__(self, name):
self.name = name
self.parent = None
self.value = None
self.childNodes = []
self._flags = []
def __iter__(self):
for node in self.childNodes:
yield node
for item in node:
yield item
def __unicode__(self):
return self.name
def toxml(self):
raise NotImplementedError
def printTree(self, indent=0):
tree = '\n|%s%s' % (' '* indent, unicode(self))
for child in self.childNodes:
tree += child.printTree(indent + 2)
return tree
def appendChild(self, node):
assert isinstance(node, Node)
if (isinstance(node, TextNode) and self.childNodes and
isinstance(self.childNodes[-1], TextNode)):
self.childNodes[-1].value += node.value
else:
self.childNodes.append(node)
node.parent = self
def insertText(self, data, insertBefore=None):
assert isinstance(data, unicode), "data %s is of type %s expected unicode"%(repr(data), type(data))
if insertBefore is None:
self.appendChild(TextNode(data))
else:
self.insertBefore(TextNode(data), insertBefore)
def insertBefore(self, node, refNode):
index = self.childNodes.index(refNode)
if (isinstance(node, TextNode) and index > 0 and
isinstance(self.childNodes[index - 1], TextNode)):
self.childNodes[index - 1].value += node.value
else:
self.childNodes.insert(index, node)
node.parent = self
def removeChild(self, node):
try:
self.childNodes.remove(node)
except:
# XXX
raise
node.parent = None
def cloneNode(self):
raise NotImplementedError
def hasContent(self):
"""Return true if the node has children or text"""
return bool(self.childNodes)
def getNameTuple(self):
if self.namespace == None:
return namespaces["html"], self.name
else:
return self.namespace, self.name
nameTuple = property(getNameTuple)
class Document(Node):
type = 1
def __init__(self):
Node.__init__(self, None)
def __str__(self):
return "#document"
def __unicode__(self):
return str(self)
def appendChild(self, child):
Node.appendChild(self, child)
def toxml(self, encoding="utf=8"):
result = ""
for child in self.childNodes:
result += child.toxml()
return result.encode(encoding)
def hilite(self, encoding="utf-8"):
result = "<pre>"
for child in self.childNodes:
result += child.hilite()
return result.encode(encoding) + "</pre>"
def printTree(self):
tree = unicode(self)
for child in self.childNodes:
tree += child.printTree(2)
return tree
def cloneNode(self):
return Document()
class DocumentFragment(Document):
type = 2
def __str__(self):
return "#document-fragment"
def __unicode__(self):
return str(self)
def cloneNode(self):
return DocumentFragment()
class DocumentType(Node):
type = 3
def __init__(self, name, publicId, systemId):
Node.__init__(self, name)
self.publicId = publicId
self.systemId = systemId
def __unicode__(self):
if self.publicId or self.systemId:
publicId = self.publicId or ""
systemId = self.systemId or ""
return """<!DOCTYPE %s "%s" "%s">"""%(
self.name, publicId, systemId)
else:
return u"<!DOCTYPE %s>" % self.name
toxml = __unicode__
def hilite(self):
return '<code class="markup doctype">&lt;!DOCTYPE %s></code>' % self.name
def cloneNode(self):
return DocumentType(self.name, self.publicId, self.systemId)
class TextNode(Node):
type = 4
def __init__(self, value):
Node.__init__(self, None)
self.value = value
def __unicode__(self):
return u"\"%s\"" % self.value
def toxml(self):
return escape(self.value)
hilite = toxml
def cloneNode(self):
return TextNode(self.value)
class Element(Node):
type = 5
def __init__(self, name, namespace=None):
Node.__init__(self, name)
self.namespace = namespace
self.attributes = {}
def __unicode__(self):
if self.namespace == None:
return u"<%s>" % self.name
else:
return u"<%s %s>"%(prefixes[self.namespace], self.name)
def toxml(self):
result = '<' + self.name
if self.attributes:
for name,value in self.attributes.iteritems():
result += u' %s="%s"' % (name, escape(value,{'"':'&quot;'}))
if self.childNodes:
result += '>'
for child in self.childNodes:
result += child.toxml()
result += u'</%s>' % self.name
else:
result += u'/>'
return result
def hilite(self):
result = '&lt;<code class="markup element-name">%s</code>' % self.name
if self.attributes:
for name, value in self.attributes.iteritems():
result += ' <code class="markup attribute-name">%s</code>=<code class="markup attribute-value">"%s"</code>' % (name, escape(value, {'"':'&quot;'}))
if self.childNodes:
result += ">"
for child in self.childNodes:
result += child.hilite()
elif self.name in voidElements:
return result + ">"
return result + '&lt;/<code class="markup element-name">%s</code>>' % self.name
def printTree(self, indent):
tree = '\n|%s%s' % (' '*indent, unicode(self))
indent += 2
if self.attributes:
for name, value in sorted(self.attributes.iteritems()):
if isinstance(name, tuple):
name = "%s %s"%(name[0], name[1])
tree += '\n|%s%s="%s"' % (' ' * indent, name, value)
for child in self.childNodes:
tree += child.printTree(indent)
return tree
def cloneNode(self):
newNode = Element(self.name)
if hasattr(self, 'namespace'):
newNode.namespace = self.namespace
for attr, value in self.attributes.iteritems():
newNode.attributes[attr] = value
return newNode
class CommentNode(Node):
type = 6
def __init__(self, data):
Node.__init__(self, None)
self.data = data
def __unicode__(self):
return "<!-- %s -->" % self.data
def toxml(self):
return "<!--%s-->" % self.data
def hilite(self):
return '<code class="markup comment">&lt;!--%s--></code>' % escape(self.data)
def cloneNode(self):
return CommentNode(self.data)
class TreeBuilder(_base.TreeBuilder):
documentClass = Document
doctypeClass = DocumentType
elementClass = Element
commentClass = CommentNode
fragmentClass = DocumentFragment
def testSerializer(self, node):
return node.printTree()

View File

@@ -1,236 +0,0 @@
import warnings
warnings.warn("BeautifulSoup 3.x (as of 3.1) is not fully compatible with html5lib and support will be removed in the future", DeprecationWarning)
from BeautifulSoup import BeautifulSoup, Tag, NavigableString, Comment, Declaration
import _base
from html5lib.constants import namespaces, DataLossWarning
class AttrList(object):
def __init__(self, element):
self.element = element
self.attrs = dict(self.element.attrs)
def __iter__(self):
return self.attrs.items().__iter__()
def __setitem__(self, name, value):
"set attr", name, value
self.element[name] = value
def items(self):
return self.attrs.items()
def keys(self):
return self.attrs.keys()
def __getitem__(self, name):
return self.attrs[name]
def __contains__(self, name):
return name in self.attrs.keys()
def __eq__(self, other):
if len(self.keys()) != len(other.keys()):
return False
for item in self.keys():
if item not in other:
return False
if self[item] != other[item]:
return False
return True
class Element(_base.Node):
def __init__(self, element, soup, namespace):
_base.Node.__init__(self, element.name)
self.element = element
self.soup = soup
self.namespace = namespace
def _nodeIndex(self, node, refNode):
# Finds a node by identity rather than equality
for index in range(len(self.element.contents)):
if id(self.element.contents[index]) == id(refNode.element):
return index
return None
def appendChild(self, node):
if (node.element.__class__ == NavigableString and self.element.contents
and self.element.contents[-1].__class__ == NavigableString):
# Concatenate new text onto old text node
# (TODO: This has O(n^2) performance, for input like "a</a>a</a>a</a>...")
newStr = NavigableString(self.element.contents[-1]+node.element)
# Remove the old text node
# (Can't simply use .extract() by itself, because it fails if
# an equal text node exists within the parent node)
oldElement = self.element.contents[-1]
del self.element.contents[-1]
oldElement.parent = None
oldElement.extract()
self.element.insert(len(self.element.contents), newStr)
else:
self.element.insert(len(self.element.contents), node.element)
node.parent = self
def getAttributes(self):
return AttrList(self.element)
def setAttributes(self, attributes):
if attributes:
for name, value in attributes.items():
self.element[name] = value
attributes = property(getAttributes, setAttributes)
def insertText(self, data, insertBefore=None):
text = TextNode(NavigableString(data), self.soup)
if insertBefore:
self.insertBefore(text, insertBefore)
else:
self.appendChild(text)
def insertBefore(self, node, refNode):
index = self._nodeIndex(node, refNode)
if (node.element.__class__ == NavigableString and self.element.contents
and self.element.contents[index-1].__class__ == NavigableString):
# (See comments in appendChild)
newStr = NavigableString(self.element.contents[index-1]+node.element)
oldNode = self.element.contents[index-1]
del self.element.contents[index-1]
oldNode.parent = None
oldNode.extract()
self.element.insert(index-1, newStr)
else:
self.element.insert(index, node.element)
node.parent = self
def removeChild(self, node):
index = self._nodeIndex(node.parent, node)
del node.parent.element.contents[index]
node.element.parent = None
node.element.extract()
node.parent = None
def reparentChildren(self, newParent):
while self.element.contents:
child = self.element.contents[0]
child.extract()
if isinstance(child, Tag):
newParent.appendChild(Element(child, self.soup, namespaces["html"]))
else:
newParent.appendChild(TextNode(child, self.soup))
def cloneNode(self):
node = Element(Tag(self.soup, self.element.name), self.soup, self.namespace)
for key,value in self.attributes:
node.attributes[key] = value
return node
def hasContent(self):
return self.element.contents
def getNameTuple(self):
if self.namespace == None:
return namespaces["html"], self.name
else:
return self.namespace, self.name
nameTuple = property(getNameTuple)
class TextNode(Element):
def __init__(self, element, soup):
_base.Node.__init__(self, None)
self.element = element
self.soup = soup
def cloneNode(self):
raise NotImplementedError
class TreeBuilder(_base.TreeBuilder):
def __init__(self, namespaceHTMLElements):
if namespaceHTMLElements:
warnings.warn("BeautifulSoup cannot represent elements in any namespace", DataLossWarning)
_base.TreeBuilder.__init__(self, namespaceHTMLElements)
def documentClass(self):
self.soup = BeautifulSoup("")
return Element(self.soup, self.soup, None)
def insertDoctype(self, token):
name = token["name"]
publicId = token["publicId"]
systemId = token["systemId"]
if publicId:
self.soup.insert(0, Declaration("DOCTYPE %s PUBLIC \"%s\" \"%s\""%(name, publicId, systemId or "")))
elif systemId:
self.soup.insert(0, Declaration("DOCTYPE %s SYSTEM \"%s\""%
(name, systemId)))
else:
self.soup.insert(0, Declaration("DOCTYPE %s"%name))
def elementClass(self, name, namespace):
if namespace is not None:
warnings.warn("BeautifulSoup cannot represent elements in any namespace", DataLossWarning)
return Element(Tag(self.soup, name), self.soup, namespace)
def commentClass(self, data):
return TextNode(Comment(data), self.soup)
def fragmentClass(self):
self.soup = BeautifulSoup("")
self.soup.name = "[document_fragment]"
return Element(self.soup, self.soup, None)
def appendChild(self, node):
self.soup.insert(len(self.soup.contents), node.element)
def testSerializer(self, element):
return testSerializer(element)
def getDocument(self):
return self.soup
def getFragment(self):
return _base.TreeBuilder.getFragment(self).element
def testSerializer(element):
import re
rv = []
def serializeElement(element, indent=0):
if isinstance(element, Declaration):
doctype_regexp = r'DOCTYPE\s+(?P<name>[^\s]*)( PUBLIC "(?P<publicId>.*)" "(?P<systemId1>.*)"| SYSTEM "(?P<systemId2>.*)")?'
m = re.compile(doctype_regexp).match(element.string)
assert m is not None, "DOCTYPE did not match expected format"
name = m.group('name')
publicId = m.group('publicId')
if publicId is not None:
systemId = m.group('systemId1') or ""
else:
systemId = m.group('systemId2')
if publicId is not None or systemId is not None:
rv.append("""|%s<!DOCTYPE %s "%s" "%s">"""%
(' '*indent, name, publicId or "", systemId or ""))
else:
rv.append("|%s<!DOCTYPE %s>"%(' '*indent, name))
elif isinstance(element, BeautifulSoup):
if element.name == "[document_fragment]":
rv.append("#document-fragment")
else:
rv.append("#document")
elif isinstance(element, Comment):
rv.append("|%s<!-- %s -->"%(' '*indent, element.string))
elif isinstance(element, unicode):
rv.append("|%s\"%s\"" %(' '*indent, element))
else:
rv.append("|%s<%s>"%(' '*indent, element.name))
if element.attrs:
for name, value in sorted(element.attrs):
rv.append('|%s%s="%s"' % (' '*(indent+2), name, value))
indent += 2
if hasattr(element, "contents"):
for child in element.contents:
serializeElement(child, indent)
serializeElement(element, 0)
return "\n".join(rv)

View File

@@ -8,23 +8,27 @@ implements a 'serialize' method taking a tree as sole argument and
returning an iterator generating tokens.
"""
from __future__ import absolute_import, division, unicode_literals
import sys
from ..utils import default_etree
treeWalkerCache = {}
def getTreeWalker(treeType, implementation=None, **kwargs):
"""Get a TreeWalker class for various types of tree with built-in support
treeType - the name of the tree type required (case-insensitive). Supported
values are "simpletree", "dom", "etree" and "beautifulsoup"
values are:
"simpletree" - a built-in DOM-ish tree type with support for some
more pythonic idioms.
"dom" - The xml.dom.minidom DOM implementation
"pulldom" - The xml.dom.pulldom event stream
"etree" - A generic walker for tree implementations exposing an
elementtree-like interface (known to work with
ElementTree, cElementTree and lxml.etree).
"lxml" - Optimized walker for lxml.etree
"beautifulsoup" - Beautiful soup (if installed)
"genshi" - a Genshi stream
implementation - (Currently applies to the "etree" tree type only). A module
@@ -33,20 +37,21 @@ def getTreeWalker(treeType, implementation=None, **kwargs):
treeType = treeType.lower()
if treeType not in treeWalkerCache:
if treeType in ("dom", "pulldom", "simpletree"):
mod = __import__(treeType, globals())
if treeType in ("dom", "pulldom"):
name = "%s.%s" % (__name__, treeType)
__import__(name)
mod = sys.modules[name]
treeWalkerCache[treeType] = mod.TreeWalker
elif treeType == "genshi":
import genshistream
from . import genshistream
treeWalkerCache[treeType] = genshistream.TreeWalker
elif treeType == "beautifulsoup":
import soup
treeWalkerCache[treeType] = soup.TreeWalker
elif treeType == "lxml":
import lxmletree
from . import lxmletree
treeWalkerCache[treeType] = lxmletree.TreeWalker
elif treeType == "etree":
import etree
from . import etree
if implementation is None:
implementation = default_etree
# XXX: NEVER cache here, caching is done in the etree submodule
return etree.getETreeModule(implementation, **kwargs).TreeWalker
return treeWalkerCache.get(treeType)

View File

@@ -1,94 +1,9 @@
from __future__ import absolute_import, division, unicode_literals
from six import text_type, string_types
import gettext
_ = gettext.gettext
from html5lib.constants import voidElements, spaceCharacters
spaceCharacters = u"".join(spaceCharacters)
class TreeWalker(object):
def __init__(self, tree):
self.tree = tree
def __iter__(self):
raise NotImplementedError
def error(self, msg):
return {"type": "SerializeError", "data": msg}
def normalizeAttrs(self, attrs):
newattrs = {}
if attrs:
#TODO: treewalkers should always have attrs
for (namespace,name),value in attrs.iteritems():
namespace = unicode(namespace) if namespace else None
name = unicode(name)
value = unicode(value)
newattrs[(namespace,name)] = value
return newattrs
def emptyTag(self, namespace, name, attrs, hasChildren=False):
yield {"type": "EmptyTag", "name": unicode(name),
"namespace":unicode(namespace),
"data": self.normalizeAttrs(attrs)}
if hasChildren:
yield self.error(_("Void element has children"))
def startTag(self, namespace, name, attrs):
return {"type": "StartTag",
"name": unicode(name),
"namespace":unicode(namespace),
"data": self.normalizeAttrs(attrs)}
def endTag(self, namespace, name):
return {"type": "EndTag",
"name": unicode(name),
"namespace":unicode(namespace),
"data": {}}
def text(self, data):
data = unicode(data)
middle = data.lstrip(spaceCharacters)
left = data[:len(data)-len(middle)]
if left:
yield {"type": "SpaceCharacters", "data": left}
data = middle
middle = data.rstrip(spaceCharacters)
right = data[len(middle):]
if middle:
yield {"type": "Characters", "data": middle}
if right:
yield {"type": "SpaceCharacters", "data": right}
def comment(self, data):
return {"type": "Comment", "data": unicode(data)}
def doctype(self, name, publicId=None, systemId=None, correct=True):
return {"type": "Doctype",
"name": name is not None and unicode(name) or u"",
"publicId": publicId,
"systemId": systemId,
"correct": correct}
def entity(self, name):
return {"type": "Entity", "name": unicode(name)}
def unknown(self, nodeType):
return self.error(_("Unknown node type: ") + nodeType)
class RecursiveTreeWalker(TreeWalker):
def walkChildren(self, node):
raise NodeImplementedError
def element(self, node, namespace, name, attrs, hasChildren):
if name in voidElements:
for token in self.emptyTag(namespace, name, attrs, hasChildren):
yield token
else:
yield self.startTag(name, attrs)
if hasChildren:
for token in self.walkChildren(node):
yield token
yield self.endTag(name)
from xml.dom import Node
DOCUMENT = Node.DOCUMENT_NODE
@@ -99,16 +14,127 @@ COMMENT = Node.COMMENT_NODE
ENTITY = Node.ENTITY_NODE
UNKNOWN = "<#UNKNOWN#>"
from ..constants import voidElements, spaceCharacters
spaceCharacters = "".join(spaceCharacters)
def to_text(s, blank_if_none=True):
"""Wrapper around six.text_type to convert None to empty string"""
if s is None:
if blank_if_none:
return ""
else:
return None
elif isinstance(s, text_type):
return s
else:
return text_type(s)
def is_text_or_none(string):
"""Wrapper around isinstance(string_types) or is None"""
return string is None or isinstance(string, string_types)
class TreeWalker(object):
def __init__(self, tree):
self.tree = tree
def __iter__(self):
raise NotImplementedError
def error(self, msg):
return {"type": "SerializeError", "data": msg}
def emptyTag(self, namespace, name, attrs, hasChildren=False):
assert namespace is None or isinstance(namespace, string_types), type(namespace)
assert isinstance(name, string_types), type(name)
assert all((namespace is None or isinstance(namespace, string_types)) and
isinstance(name, string_types) and
isinstance(value, string_types)
for (namespace, name), value in attrs.items())
yield {"type": "EmptyTag", "name": to_text(name, False),
"namespace": to_text(namespace),
"data": attrs}
if hasChildren:
yield self.error(_("Void element has children"))
def startTag(self, namespace, name, attrs):
assert namespace is None or isinstance(namespace, string_types), type(namespace)
assert isinstance(name, string_types), type(name)
assert all((namespace is None or isinstance(namespace, string_types)) and
isinstance(name, string_types) and
isinstance(value, string_types)
for (namespace, name), value in attrs.items())
return {"type": "StartTag",
"name": text_type(name),
"namespace": to_text(namespace),
"data": dict(((to_text(namespace, False), to_text(name)),
to_text(value, False))
for (namespace, name), value in attrs.items())}
def endTag(self, namespace, name):
assert namespace is None or isinstance(namespace, string_types), type(namespace)
assert isinstance(name, string_types), type(namespace)
return {"type": "EndTag",
"name": to_text(name, False),
"namespace": to_text(namespace),
"data": {}}
def text(self, data):
assert isinstance(data, string_types), type(data)
data = to_text(data)
middle = data.lstrip(spaceCharacters)
left = data[:len(data) - len(middle)]
if left:
yield {"type": "SpaceCharacters", "data": left}
data = middle
middle = data.rstrip(spaceCharacters)
right = data[len(middle):]
if middle:
yield {"type": "Characters", "data": middle}
if right:
yield {"type": "SpaceCharacters", "data": right}
def comment(self, data):
assert isinstance(data, string_types), type(data)
return {"type": "Comment", "data": text_type(data)}
def doctype(self, name, publicId=None, systemId=None, correct=True):
assert is_text_or_none(name), type(name)
assert is_text_or_none(publicId), type(publicId)
assert is_text_or_none(systemId), type(systemId)
return {"type": "Doctype",
"name": to_text(name),
"publicId": to_text(publicId),
"systemId": to_text(systemId),
"correct": to_text(correct)}
def entity(self, name):
assert isinstance(name, string_types), type(name)
return {"type": "Entity", "name": text_type(name)}
def unknown(self, nodeType):
return self.error(_("Unknown node type: ") + nodeType)
class NonRecursiveTreeWalker(TreeWalker):
def getNodeDetails(self, node):
raise NotImplementedError
def getFirstChild(self, node):
raise NotImplementedError
def getNextSibling(self, node):
raise NotImplementedError
def getParentNode(self, node):
raise NotImplementedError
@@ -118,7 +144,6 @@ class NonRecursiveTreeWalker(TreeWalker):
details = self.getNodeDetails(currentNode)
type, details = details[0], details[1:]
hasChildren = False
endTag = None
if type == DOCTYPE:
yield self.doctype(*details)
@@ -130,12 +155,11 @@ class NonRecursiveTreeWalker(TreeWalker):
elif type == ELEMENT:
namespace, name, attributes, hasChildren = details
if name in voidElements:
for token in self.emptyTag(namespace, name, attributes,
for token in self.emptyTag(namespace, name, attributes,
hasChildren):
yield token
hasChildren = False
else:
endTag = name
yield self.startTag(namespace, name, attributes)
elif type == COMMENT:
@@ -149,12 +173,12 @@ class NonRecursiveTreeWalker(TreeWalker):
else:
yield self.unknown(details[0])
if hasChildren:
firstChild = self.getFirstChild(currentNode)
else:
firstChild = None
if firstChild is not None:
currentNode = firstChild
else:

View File

@@ -1,10 +1,12 @@
from __future__ import absolute_import, division, unicode_literals
from xml.dom import Node
import gettext
_ = gettext.gettext
import _base
from html5lib.constants import voidElements
from . import _base
class TreeWalker(_base.NonRecursiveTreeWalker):
def getNodeDetails(self, node):
@@ -16,10 +18,13 @@ class TreeWalker(_base.NonRecursiveTreeWalker):
elif node.nodeType == Node.ELEMENT_NODE:
attrs = {}
for attr in node.attributes.keys():
for attr in list(node.attributes.keys()):
attr = node.getAttributeNode(attr)
attrs[(attr.namespaceURI,attr.localName)] = attr.value
return (_base.ELEMENT, node.namespaceURI, node.nodeName,
if attr.namespaceURI:
attrs[(attr.namespaceURI, attr.localName)] = attr.value
else:
attrs[(None, attr.name)] = attr.value
return (_base.ELEMENT, node.namespaceURI, node.nodeName,
attrs, node.hasChildNodes())
elif node.nodeType == Node.COMMENT_NODE:

View File

@@ -1,33 +1,28 @@
from __future__ import absolute_import, division, unicode_literals
try:
from collections import OrderedDict
except ImportError:
try:
from ordereddict import OrderedDict
except ImportError:
OrderedDict = dict
import gettext
_ = gettext.gettext
try:
from types import ModuleType
except:
from new import module as ModuleType
import copy
import re
import _base
from html5lib.constants import voidElements
from six import text_type
from . import _base
from ..utils import moduleFactoryFactory
tag_regexp = re.compile("{([^}]*)}(.*)")
moduleCache = {}
def getETreeModule(ElementTreeImplementation):
name = "_" + ElementTreeImplementation.__name__+"builder"
if name in moduleCache:
return moduleCache[name]
else:
mod = ModuleType("_" + ElementTreeImplementation.__name__+"builder")
objs = getETreeBuilder(ElementTreeImplementation)
mod.__dict__.update(objs)
moduleCache[name] = mod
return mod
def getETreeBuilder(ElementTreeImplementation):
ElementTree = ElementTreeImplementation
ElementTreeCommentType = ElementTree.Comment("asd").tag
class TreeWalker(_base.NonRecursiveTreeWalker):
"""Given the particular ElementTree representation, this implementation,
@@ -35,16 +30,16 @@ def getETreeBuilder(ElementTreeImplementation):
content:
1. The current element
2. The index of the element relative to its parent
3. A stack of ancestor elements
4. A flag "text", "tail" or None to indicate if the current node is a
text node; either the text or tail of the current element (1)
"""
def getNodeDetails(self, node):
if isinstance(node, tuple): # It might be the root Element
if isinstance(node, tuple): # It might be the root Element
elt, key, parents, flag = node
if flag in ("text", "tail"):
return _base.TEXT, getattr(elt, flag)
@@ -54,41 +49,41 @@ def getETreeBuilder(ElementTreeImplementation):
if not(hasattr(node, "tag")):
node = node.getroot()
if node.tag in ("<DOCUMENT_ROOT>", "<DOCUMENT_FRAGMENT>"):
if node.tag in ("DOCUMENT_ROOT", "DOCUMENT_FRAGMENT"):
return (_base.DOCUMENT,)
elif node.tag == "<!DOCTYPE>":
return (_base.DOCTYPE, node.text,
return (_base.DOCTYPE, node.text,
node.get("publicId"), node.get("systemId"))
elif node.tag == ElementTree.Comment:
elif node.tag == ElementTreeCommentType:
return _base.COMMENT, node.text
else:
assert type(node.tag) in (str, unicode), type(node.tag)
#This is assumed to be an ordinary element
assert type(node.tag) == text_type, type(node.tag)
# This is assumed to be an ordinary element
match = tag_regexp.match(node.tag)
if match:
namespace, tag = match.groups()
else:
namespace = None
tag = node.tag
attrs = {}
for name, value in node.attrib.items():
attrs = OrderedDict()
for name, value in list(node.attrib.items()):
match = tag_regexp.match(name)
if match:
attrs[(match.group(1),match.group(2))] = value
attrs[(match.group(1), match.group(2))] = value
else:
attrs[(None,name)] = value
return (_base.ELEMENT, namespace, tag,
attrs[(None, name)] = value
return (_base.ELEMENT, namespace, tag,
attrs, len(node) or node.text)
def getFirstChild(self, node):
if isinstance(node, tuple):
element, key, parents, flag = node
else:
element, key, parents, flag = node, None, [], None
if flag in ("text", "tail"):
return None
else:
@@ -99,13 +94,13 @@ def getETreeBuilder(ElementTreeImplementation):
return element[0], 0, parents, None
else:
return None
def getNextSibling(self, node):
if isinstance(node, tuple):
element, key, parents, flag = node
else:
return None
if flag == "text":
if len(element):
parents.append(element)
@@ -116,16 +111,16 @@ def getETreeBuilder(ElementTreeImplementation):
if element.tail and flag != "tail":
return element, key, parents, "tail"
elif key < len(parents[-1]) - 1:
return parents[-1][key+1], key+1, parents, None
return parents[-1][key + 1], key + 1, parents, None
else:
return None
def getParentNode(self, node):
if isinstance(node, tuple):
element, key, parents, flag = node
else:
return None
if flag == "text":
if not parents:
return element
@@ -139,3 +134,5 @@ def getETreeBuilder(ElementTreeImplementation):
return parent, list(parents[-1]).index(parent), parents, None
return locals()
getETreeModule = moduleFactoryFactory(getETreeBuilder)

View File

@@ -1,50 +1,49 @@
from __future__ import absolute_import, division, unicode_literals
from genshi.core import QName
from genshi.core import START, END, XML_NAMESPACE, DOCTYPE, TEXT
from genshi.core import START_NS, END_NS, START_CDATA, END_CDATA, PI, COMMENT
from genshi.output import NamespaceFlattener
from genshi.core import START_NS, END_NS, START_CDATA, END_CDATA, PI, COMMENT
import _base
from . import _base
from ..constants import voidElements, namespaces
from html5lib.constants import voidElements
class TreeWalker(_base.TreeWalker):
def __iter__(self):
depth = 0
ignore_until = None
# Buffer the events so we can pass in the following one
previous = None
for event in self.tree:
if previous is not None:
if previous[0] == START:
depth += 1
if ignore_until <= depth:
ignore_until = None
if ignore_until is None:
for token in self.tokens(previous, event):
yield token
if token["type"] == "EmptyTag":
ignore_until = depth
if previous[0] == END:
depth -= 1
previous = event
if previous is not None:
if ignore_until is None or ignore_until <= depth:
for token in self.tokens(previous, None):
for token in self.tokens(previous, event):
yield token
elif ignore_until is not None:
raise ValueError("Illformed DOM event stream: void element without END_ELEMENT")
previous = event
# Don't forget the final event!
if previous is not None:
for token in self.tokens(previous, None):
yield token
def tokens(self, event, next):
kind, data, pos = event
if kind == START:
tag, attrib = data
tag, attribs = data
name = tag.localname
namespace = tag.namespace
if tag in voidElements:
for token in self.emptyTag(namespace, name, list(attrib),
not next or next[0] != END
converted_attribs = {}
for k, v in attribs:
if isinstance(k, QName):
converted_attribs[(k.namespace, k.localname)] = v
else:
converted_attribs[(None, k)] = v
if namespace == namespaces["html"] and name in voidElements:
for token in self.emptyTag(namespace, name, converted_attribs,
not next or next[0] != END
or next[1] != tag):
yield token
else:
yield self.startTag(namespace, name, list(attrib))
yield self.startTag(namespace, name, converted_attribs)
elif kind == END:
name = data.localname
@@ -62,8 +61,8 @@ class TreeWalker(_base.TreeWalker):
elif kind == DOCTYPE:
yield self.doctype(*data)
elif kind in (XML_NAMESPACE, DOCTYPE, START_NS, END_NS, \
START_CDATA, END_CDATA, PI):
elif kind in (XML_NAMESPACE, DOCTYPE, START_NS, END_NS,
START_CDATA, END_CDATA, PI):
pass
else:

View File

@@ -1,186 +1,208 @@
from lxml import etree
from html5lib.treebuilders.etree import tag_regexp
from gettext import gettext
_ = gettext
import _base
from html5lib.constants import voidElements
from html5lib import ihatexml
class Root(object):
def __init__(self, et):
self.elementtree = et
self.children = []
if et.docinfo.internalDTD:
self.children.append(Doctype(self, et.docinfo.root_name,
et.docinfo.public_id,
et.docinfo.system_url))
root = et.getroot()
node = root
while node.getprevious() is not None:
node = node.getprevious()
while node is not None:
self.children.append(node)
node = node.getnext()
self.text = None
self.tail = None
def __getitem__(self, key):
return self.children[key]
def getnext(self):
return None
def __len__(self):
return 1
class Doctype(object):
def __init__(self, root_node, name, public_id, system_id):
self.root_node = root_node
self.name = name
self.public_id = public_id
self.system_id = system_id
self.text = None
self.tail = None
def getnext(self):
return self.root_node.children[1]
class FragmentRoot(Root):
def __init__(self, children):
self.children = [FragmentWrapper(self, child) for child in children]
self.text = self.tail = None
def getnext(self):
return None
class FragmentWrapper(object):
def __init__(self, fragment_root, obj):
self.root_node = fragment_root
self.obj = obj
if hasattr(self.obj, 'text'):
self.text = self.obj.text
else:
self.text = None
if hasattr(self.obj, 'tail'):
self.tail = self.obj.tail
else:
self.tail = None
self.isstring = isinstance(obj, basestring)
def __getattr__(self, name):
return getattr(self.obj, name)
def getnext(self):
siblings = self.root_node.children
idx = siblings.index(self)
if idx < len(siblings) - 1:
return siblings[idx + 1]
else:
return None
def __getitem__(self, key):
return self.obj[key]
def __nonzero__(self):
return bool(self.obj)
def getparent(self):
return None
def __str__(self):
return str(self.obj)
def __unicode__(self):
return unicode(self.obj)
def __len__(self):
return len(self.obj)
class TreeWalker(_base.NonRecursiveTreeWalker):
def __init__(self, tree):
if hasattr(tree, "getroot"):
tree = Root(tree)
elif isinstance(tree, list):
tree = FragmentRoot(tree)
_base.NonRecursiveTreeWalker.__init__(self, tree)
self.filter = ihatexml.InfosetFilter()
def getNodeDetails(self, node):
if isinstance(node, tuple): # Text node
node, key = node
assert key in ("text", "tail"), _("Text nodes are text or tail, found %s") % key
return _base.TEXT, getattr(node, key)
elif isinstance(node, Root):
return (_base.DOCUMENT,)
elif isinstance(node, Doctype):
return _base.DOCTYPE, node.name, node.public_id, node.system_id
elif isinstance(node, FragmentWrapper) and node.isstring:
return _base.TEXT, node
elif node.tag == etree.Comment:
return _base.COMMENT, node.text
elif node.tag == etree.Entity:
return _base.ENTITY, node.text[1:-1] # strip &;
else:
#This is assumed to be an ordinary element
match = tag_regexp.match(node.tag)
if match:
namespace, tag = match.groups()
else:
namespace = None
tag = node.tag
attrs = {}
for name, value in node.attrib.items():
match = tag_regexp.match(name)
if match:
attrs[(match.group(1),match.group(2))] = value
else:
attrs[(None,name)] = value
return (_base.ELEMENT, namespace, self.filter.fromXmlName(tag),
attrs, len(node) > 0 or node.text)
def getFirstChild(self, node):
assert not isinstance(node, tuple), _("Text nodes have no children")
assert len(node) or node.text, "Node has no children"
if node.text:
return (node, "text")
else:
return node[0]
def getNextSibling(self, node):
if isinstance(node, tuple): # Text node
node, key = node
assert key in ("text", "tail"), _("Text nodes are text or tail, found %s") % key
if key == "text":
# XXX: we cannot use a "bool(node) and node[0] or None" construct here
# because node[0] might evaluate to False if it has no child element
if len(node):
return node[0]
else:
return None
else: # tail
return node.getnext()
return node.tail and (node, "tail") or node.getnext()
def getParentNode(self, node):
if isinstance(node, tuple): # Text node
node, key = node
assert key in ("text", "tail"), _("Text nodes are text or tail, found %s") % key
if key == "text":
return node
# else: fallback to "normal" processing
return node.getparent()
from __future__ import absolute_import, division, unicode_literals
from six import text_type
from lxml import etree
from ..treebuilders.etree import tag_regexp
from gettext import gettext
_ = gettext
from . import _base
from .. import ihatexml
def ensure_str(s):
if s is None:
return None
elif isinstance(s, text_type):
return s
else:
return s.decode("utf-8", "strict")
class Root(object):
def __init__(self, et):
self.elementtree = et
self.children = []
if et.docinfo.internalDTD:
self.children.append(Doctype(self,
ensure_str(et.docinfo.root_name),
ensure_str(et.docinfo.public_id),
ensure_str(et.docinfo.system_url)))
root = et.getroot()
node = root
while node.getprevious() is not None:
node = node.getprevious()
while node is not None:
self.children.append(node)
node = node.getnext()
self.text = None
self.tail = None
def __getitem__(self, key):
return self.children[key]
def getnext(self):
return None
def __len__(self):
return 1
class Doctype(object):
def __init__(self, root_node, name, public_id, system_id):
self.root_node = root_node
self.name = name
self.public_id = public_id
self.system_id = system_id
self.text = None
self.tail = None
def getnext(self):
return self.root_node.children[1]
class FragmentRoot(Root):
def __init__(self, children):
self.children = [FragmentWrapper(self, child) for child in children]
self.text = self.tail = None
def getnext(self):
return None
class FragmentWrapper(object):
def __init__(self, fragment_root, obj):
self.root_node = fragment_root
self.obj = obj
if hasattr(self.obj, 'text'):
self.text = ensure_str(self.obj.text)
else:
self.text = None
if hasattr(self.obj, 'tail'):
self.tail = ensure_str(self.obj.tail)
else:
self.tail = None
self.isstring = isinstance(obj, str) or isinstance(obj, bytes)
# Support for bytes here is Py2
if self.isstring:
self.obj = ensure_str(self.obj)
def __getattr__(self, name):
return getattr(self.obj, name)
def getnext(self):
siblings = self.root_node.children
idx = siblings.index(self)
if idx < len(siblings) - 1:
return siblings[idx + 1]
else:
return None
def __getitem__(self, key):
return self.obj[key]
def __bool__(self):
return bool(self.obj)
def getparent(self):
return None
def __str__(self):
return str(self.obj)
def __unicode__(self):
return str(self.obj)
def __len__(self):
return len(self.obj)
class TreeWalker(_base.NonRecursiveTreeWalker):
def __init__(self, tree):
if hasattr(tree, "getroot"):
tree = Root(tree)
elif isinstance(tree, list):
tree = FragmentRoot(tree)
_base.NonRecursiveTreeWalker.__init__(self, tree)
self.filter = ihatexml.InfosetFilter()
def getNodeDetails(self, node):
if isinstance(node, tuple): # Text node
node, key = node
assert key in ("text", "tail"), _("Text nodes are text or tail, found %s") % key
return _base.TEXT, ensure_str(getattr(node, key))
elif isinstance(node, Root):
return (_base.DOCUMENT,)
elif isinstance(node, Doctype):
return _base.DOCTYPE, node.name, node.public_id, node.system_id
elif isinstance(node, FragmentWrapper) and node.isstring:
return _base.TEXT, node.obj
elif node.tag == etree.Comment:
return _base.COMMENT, ensure_str(node.text)
elif node.tag == etree.Entity:
return _base.ENTITY, ensure_str(node.text)[1:-1] # strip &;
else:
# This is assumed to be an ordinary element
match = tag_regexp.match(ensure_str(node.tag))
if match:
namespace, tag = match.groups()
else:
namespace = None
tag = ensure_str(node.tag)
attrs = {}
for name, value in list(node.attrib.items()):
name = ensure_str(name)
value = ensure_str(value)
match = tag_regexp.match(name)
if match:
attrs[(match.group(1), match.group(2))] = value
else:
attrs[(None, name)] = value
return (_base.ELEMENT, namespace, self.filter.fromXmlName(tag),
attrs, len(node) > 0 or node.text)
def getFirstChild(self, node):
assert not isinstance(node, tuple), _("Text nodes have no children")
assert len(node) or node.text, "Node has no children"
if node.text:
return (node, "text")
else:
return node[0]
def getNextSibling(self, node):
if isinstance(node, tuple): # Text node
node, key = node
assert key in ("text", "tail"), _("Text nodes are text or tail, found %s") % key
if key == "text":
# XXX: we cannot use a "bool(node) and node[0] or None" construct here
# because node[0] might evaluate to False if it has no child element
if len(node):
return node[0]
else:
return None
else: # tail
return node.getnext()
return (node, "tail") if node.tail else node.getnext()
def getParentNode(self, node):
if isinstance(node, tuple): # Text node
node, key = node
assert key in ("text", "tail"), _("Text nodes are text or tail, found %s") % key
if key == "text":
return node
# else: fallback to "normal" processing
return node.getparent()

View File

@@ -1,9 +1,12 @@
from __future__ import absolute_import, division, unicode_literals
from xml.dom.pulldom import START_ELEMENT, END_ELEMENT, \
COMMENT, IGNORABLE_WHITESPACE, CHARACTERS
import _base
from . import _base
from ..constants import voidElements
from html5lib.constants import voidElements
class TreeWalker(_base.TreeWalker):
def __iter__(self):
@@ -11,7 +14,7 @@ class TreeWalker(_base.TreeWalker):
previous = None
for event in self.tree:
if previous is not None and \
(ignore_until is None or previous[1] is ignore_until):
(ignore_until is None or previous[1] is ignore_until):
if previous[1] is ignore_until:
ignore_until = None
for token in self.tokens(previous, event):
@@ -31,9 +34,9 @@ class TreeWalker(_base.TreeWalker):
name = node.nodeName
namespace = node.namespaceURI
attrs = {}
for attr in node.attributes.keys():
for attr in list(node.attributes.keys()):
attr = node.getAttributeNode(attr)
attrs[(attr.namespaceURI,attr.localName)] = attr.value
attrs[(attr.namespaceURI, attr.localName)] = attr.value
if name in voidElements:
for token in self.emptyTag(namespace,
name,

View File

@@ -1,78 +0,0 @@
import gettext
_ = gettext.gettext
import _base
class TreeWalker(_base.NonRecursiveTreeWalker):
"""Given that simpletree has no performant way of getting a node's
next sibling, this implementation returns "nodes" as tuples with the
following content:
1. The parent Node (Element, Document or DocumentFragment)
2. The child index of the current node in its parent's children list
3. A list used as a stack of all ancestors. It is a pair tuple whose
first item is a parent Node and second item is a child index.
"""
def getNodeDetails(self, node):
if isinstance(node, tuple): # It might be the root Node
parent, idx, parents = node
node = parent.childNodes[idx]
# testing node.type allows us not to import treebuilders.simpletree
if node.type in (1, 2): # Document or DocumentFragment
return (_base.DOCUMENT,)
elif node.type == 3: # DocumentType
return _base.DOCTYPE, node.name, node.publicId, node.systemId
elif node.type == 4: # TextNode
return _base.TEXT, node.value
elif node.type == 5: # Element
attrs = {}
for name, value in node.attributes.items():
if isinstance(name, tuple):
attrs[(name[2],name[1])] = value
else:
attrs[(None,name)] = value
return (_base.ELEMENT, node.namespace, node.name,
attrs, node.hasContent())
elif node.type == 6: # CommentNode
return _base.COMMENT, node.data
else:
return _node.UNKNOWN, node.type
def getFirstChild(self, node):
if isinstance(node, tuple): # It might be the root Node
parent, idx, parents = node
parents.append((parent, idx))
node = parent.childNodes[idx]
else:
parents = []
assert node.hasContent(), "Node has no children"
return (node, 0, parents)
def getNextSibling(self, node):
assert isinstance(node, tuple), "Node is not a tuple: " + str(node)
parent, idx, parents = node
idx += 1
if len(parent.childNodes) > idx:
return (parent, idx, parents)
else:
return None
def getParentNode(self, node):
assert isinstance(node, tuple)
parent, idx, parents = node
if parents:
parent, idx = parents.pop()
return parent, idx, parents
else:
# HACK: We could return ``parent`` but None will stop the algorithm the same way
return None

View File

@@ -1,60 +0,0 @@
import re
import gettext
_ = gettext.gettext
from BeautifulSoup import BeautifulSoup, Declaration, Comment, Tag
from html5lib.constants import namespaces
import _base
class TreeWalker(_base.NonRecursiveTreeWalker):
doctype_regexp = re.compile(
r'DOCTYPE\s+(?P<name>[^\s]*)(\s*PUBLIC\s*"(?P<publicId>.*)"\s*"(?P<systemId1>.*)"|\s*SYSTEM\s*"(?P<systemId2>.*)")?')
def getNodeDetails(self, node):
if isinstance(node, BeautifulSoup): # Document or DocumentFragment
return (_base.DOCUMENT,)
elif isinstance(node, Declaration): # DocumentType
string = unicode(node.string)
#Slice needed to remove markup added during unicode conversion,
#but only in some versions of BeautifulSoup/Python
if string.startswith('<!') and string.endswith('>'):
string = string[2:-1]
m = self.doctype_regexp.match(string)
#This regexp approach seems wrong and fragile
#but beautiful soup stores the doctype as a single thing and we want the seperate bits
#It should work as long as the tree is created by html5lib itself but may be wrong if it's
#been modified at all
#We could just feed to it a html5lib tokenizer, I guess...
assert m is not None, "DOCTYPE did not match expected format"
name = m.group('name')
publicId = m.group('publicId')
if publicId is not None:
systemId = m.group('systemId1')
else:
systemId = m.group('systemId2')
return _base.DOCTYPE, name, publicId or "", systemId or ""
elif isinstance(node, Comment):
string = unicode(node.string)
if string.startswith('<!--') and string.endswith('-->'):
string = string[4:-3]
return _base.COMMENT, string
elif isinstance(node, unicode): # TextNode
return _base.TEXT, node
elif isinstance(node, Tag): # Element
return (_base.ELEMENT, namespaces["html"], node.name,
dict(node.attrs).items(), node.contents)
else:
return _base.UNKNOWN, node.__class__.__name__
def getFirstChild(self, node):
return node.contents[0]
def getNextSibling(self, node):
return node.nextSibling
def getParentNode(self, node):
return node.parent

View File

@@ -0,0 +1,12 @@
from __future__ import absolute_import, division, unicode_literals
from .py import Trie as PyTrie
Trie = PyTrie
try:
from .datrie import Trie as DATrie
except ImportError:
pass
else:
Trie = DATrie

View File

@@ -0,0 +1,37 @@
from __future__ import absolute_import, division, unicode_literals
from collections import Mapping
class Trie(Mapping):
"""Abstract base class for tries"""
def keys(self, prefix=None):
keys = super().keys()
if prefix is None:
return set(keys)
# Python 2.6: no set comprehensions
return set([x for x in keys if x.startswith(prefix)])
def has_keys_with_prefix(self, prefix):
for key in self.keys():
if key.startswith(prefix):
return True
return False
def longest_prefix(self, prefix):
if prefix in self:
return prefix
for i in range(1, len(prefix) + 1):
if prefix[:-i] in self:
return prefix[:-i]
raise KeyError(prefix)
def longest_prefix_item(self, prefix):
lprefix = self.longest_prefix(prefix)
return (lprefix, self[lprefix])

View File

@@ -0,0 +1,44 @@
from __future__ import absolute_import, division, unicode_literals
from datrie import Trie as DATrie
from six import text_type
from ._base import Trie as ABCTrie
class Trie(ABCTrie):
def __init__(self, data):
chars = set()
for key in data.keys():
if not isinstance(key, text_type):
raise TypeError("All keys must be strings")
for char in key:
chars.add(char)
self._data = DATrie("".join(chars))
for key, value in data.items():
self._data[key] = value
def __contains__(self, key):
return key in self._data
def __len__(self):
return len(self._data)
def __iter__(self):
raise NotImplementedError()
def __getitem__(self, key):
return self._data[key]
def keys(self, prefix=None):
return self._data.keys(prefix)
def has_keys_with_prefix(self, prefix):
return self._data.has_keys_with_prefix(prefix)
def longest_prefix(self, prefix):
return self._data.longest_prefix(prefix)
def longest_prefix_item(self, prefix):
return self._data.longest_prefix_item(prefix)

67
libs/html5lib/trie/py.py Normal file
View File

@@ -0,0 +1,67 @@
from __future__ import absolute_import, division, unicode_literals
from six import text_type
from bisect import bisect_left
from ._base import Trie as ABCTrie
class Trie(ABCTrie):
def __init__(self, data):
if not all(isinstance(x, text_type) for x in data.keys()):
raise TypeError("All keys must be strings")
self._data = data
self._keys = sorted(data.keys())
self._cachestr = ""
self._cachepoints = (0, len(data))
def __contains__(self, key):
return key in self._data
def __len__(self):
return len(self._data)
def __iter__(self):
return iter(self._data)
def __getitem__(self, key):
return self._data[key]
def keys(self, prefix=None):
if prefix is None or prefix == "" or not self._keys:
return set(self._keys)
if prefix.startswith(self._cachestr):
lo, hi = self._cachepoints
start = i = bisect_left(self._keys, prefix, lo, hi)
else:
start = i = bisect_left(self._keys, prefix)
keys = set()
if start == len(self._keys):
return keys
while self._keys[i].startswith(prefix):
keys.add(self._keys[i])
i += 1
self._cachestr = prefix
self._cachepoints = (start, i)
return keys
def has_keys_with_prefix(self, prefix):
if prefix in self._data:
return True
if prefix.startswith(self._cachestr):
lo, hi = self._cachepoints
i = bisect_left(self._keys, prefix, lo, hi)
else:
i = bisect_left(self._keys, prefix)
if i == len(self._keys):
return False
return self._keys[i].startswith(prefix)

View File

@@ -1,9 +1,16 @@
from __future__ import absolute_import, division, unicode_literals
from types import ModuleType
try:
frozenset
except NameError:
#Import from the sets module for python 2.3
from sets import Set as set
from sets import ImmutableSet as frozenset
import xml.etree.cElementTree as default_etree
except ImportError:
import xml.etree.ElementTree as default_etree
__all__ = ["default_etree", "MethodDispatcher", "isSurrogatePair",
"surrogatePairToCodepoint", "moduleFactoryFactory"]
class MethodDispatcher(dict):
"""Dict with 2 special properties:
@@ -23,7 +30,7 @@ class MethodDispatcher(dict):
# twice as fast. Please do careful performance testing before changing
# anything here.
_dictEntries = []
for name,value in items:
for name, value in items:
if type(name) in (list, tuple, frozenset, set):
for item in name:
_dictEntries.append((item, value))
@@ -35,141 +42,41 @@ class MethodDispatcher(dict):
def __getitem__(self, key):
return dict.get(self, key, self.default)
#Pure python implementation of deque taken from the ASPN Python Cookbook
#Original code by Raymond Hettinger
class deque(object):
# Some utility functions to dal with weirdness around UCS2 vs UCS4
# python builds
def __init__(self, iterable=(), maxsize=-1):
if not hasattr(self, 'data'):
self.left = self.right = 0
self.data = {}
self.maxsize = maxsize
self.extend(iterable)
def append(self, x):
self.data[self.right] = x
self.right += 1
if self.maxsize != -1 and len(self) > self.maxsize:
self.popleft()
def appendleft(self, x):
self.left -= 1
self.data[self.left] = x
if self.maxsize != -1 and len(self) > self.maxsize:
self.pop()
def pop(self):
if self.left == self.right:
raise IndexError('cannot pop from empty deque')
self.right -= 1
elem = self.data[self.right]
del self.data[self.right]
return elem
def popleft(self):
if self.left == self.right:
raise IndexError('cannot pop from empty deque')
elem = self.data[self.left]
del self.data[self.left]
self.left += 1
return elem
def clear(self):
self.data.clear()
self.left = self.right = 0
def extend(self, iterable):
for elem in iterable:
self.append(elem)
def extendleft(self, iterable):
for elem in iterable:
self.appendleft(elem)
def rotate(self, n=1):
if self:
n %= len(self)
for i in xrange(n):
self.appendleft(self.pop())
def __getitem__(self, i):
if i < 0:
i += len(self)
try:
return self.data[i + self.left]
except KeyError:
raise IndexError
def __setitem__(self, i, value):
if i < 0:
i += len(self)
try:
self.data[i + self.left] = value
except KeyError:
raise IndexError
def __delitem__(self, i):
size = len(self)
if not (-size <= i < size):
raise IndexError
data = self.data
if i < 0:
i += size
for j in xrange(self.left+i, self.right-1):
data[j] = data[j+1]
self.pop()
def __len__(self):
return self.right - self.left
def __cmp__(self, other):
if type(self) != type(other):
return cmp(type(self), type(other))
return cmp(list(self), list(other))
def __repr__(self, _track=[]):
if id(self) in _track:
return '...'
_track.append(id(self))
r = 'deque(%r)' % (list(self),)
_track.remove(id(self))
return r
def __getstate__(self):
return (tuple(self),)
def __setstate__(self, s):
self.__init__(s[0])
def __hash__(self):
raise TypeError
def __copy__(self):
return self.__class__(self)
def __deepcopy__(self, memo={}):
from copy import deepcopy
result = self.__class__()
memo[id(self)] = result
result.__init__(deepcopy(tuple(self), memo))
return result
#Some utility functions to dal with weirdness around UCS2 vs UCS4
#python builds
def encodingType():
if len() == 2:
return "UCS2"
else:
return "UCS4"
def isSurrogatePair(data):
def isSurrogatePair(data):
return (len(data) == 2 and
ord(data[0]) >= 0xD800 and ord(data[0]) <= 0xDBFF and
ord(data[1]) >= 0xDC00 and ord(data[1]) <= 0xDFFF)
def surrogatePairToCodepoint(data):
char_val = (0x10000 + (ord(data[0]) - 0xD800) * 0x400 +
char_val = (0x10000 + (ord(data[0]) - 0xD800) * 0x400 +
(ord(data[1]) - 0xDC00))
return char_val
# Module Factory Factory (no, this isn't Java, I know)
# Here to stop this being duplicated all over the place.
def moduleFactoryFactory(factory):
moduleCache = {}
def moduleFactory(baseModule, *args, **kwargs):
if isinstance(ModuleType.__name__, type("")):
name = "_%s_factory" % baseModule.__name__
else:
name = b"_%s_factory" % baseModule.__name__
if name in moduleCache:
return moduleCache[name]
else:
mod = ModuleType(name)
objs = factory(baseModule, *args, **kwargs)
mod.__dict__.update(objs)
moduleCache[name] = mod
return mod
return moduleFactory

File diff suppressed because it is too large Load Diff

739
libs/httplib2/cacerts.txt Normal file
View File

@@ -0,0 +1,739 @@
# Certifcate Authority certificates for validating SSL connections.
#
# This file contains PEM format certificates generated from
# http://mxr.mozilla.org/seamonkey/source/security/nss/lib/ckfw/builtins/certdata.txt
#
# ***** BEGIN LICENSE BLOCK *****
# Version: MPL 1.1/GPL 2.0/LGPL 2.1
#
# The contents of this file are subject to the Mozilla Public License Version
# 1.1 (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
# http://www.mozilla.org/MPL/
#
# Software distributed under the License is distributed on an "AS IS" basis,
# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
# for the specific language governing rights and limitations under the
# License.
#
# The Original Code is the Netscape security libraries.
#
# The Initial Developer of the Original Code is
# Netscape Communications Corporation.
# Portions created by the Initial Developer are Copyright (C) 1994-2000
# the Initial Developer. All Rights Reserved.
#
# Contributor(s):
#
# Alternatively, the contents of this file may be used under the terms of
# either the GNU General Public License Version 2 or later (the "GPL"), or
# the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
# in which case the provisions of the GPL or the LGPL are applicable instead
# of those above. If you wish to allow use of your version of this file only
# under the terms of either the GPL or the LGPL, and not to allow others to
# use your version of this file under the terms of the MPL, indicate your
# decision by deleting the provisions above and replace them with the notice
# and other provisions required by the GPL or the LGPL. If you do not delete
# the provisions above, a recipient may use your version of this file under
# the terms of any one of the MPL, the GPL or the LGPL.
#
# ***** END LICENSE BLOCK *****
Verisign/RSA Secure Server CA
=============================
-----BEGIN CERTIFICATE-----
MIICNDCCAaECEAKtZn5ORf5eV288mBle3cAwDQYJKoZIhvcNAQECBQAwXzELMAkG
A1UEBhMCVVMxIDAeBgNVBAoTF1JTQSBEYXRhIFNlY3VyaXR5LCBJbmMuMS4wLAYD
VQQLEyVTZWN1cmUgU2VydmVyIENlcnRpZmljYXRpb24gQXV0aG9yaXR5MB4XDTk0
MTEwOTAwMDAwMFoXDTEwMDEwNzIzNTk1OVowXzELMAkGA1UEBhMCVVMxIDAeBgNV
BAoTF1JTQSBEYXRhIFNlY3VyaXR5LCBJbmMuMS4wLAYDVQQLEyVTZWN1cmUgU2Vy
dmVyIENlcnRpZmljYXRpb24gQXV0aG9yaXR5MIGbMA0GCSqGSIb3DQEBAQUAA4GJ
ADCBhQJ+AJLOesGugz5aqomDV6wlAXYMra6OLDfO6zV4ZFQD5YRAUcm/jwjiioII
0haGN1XpsSECrXZogZoFokvJSyVmIlZsiAeP94FZbYQHZXATcXY+m3dM41CJVphI
uR2nKRoTLkoRWZweFdVJVCxzOmmCsZc5nG1wZ0jl3S3WyB57AgMBAAEwDQYJKoZI
hvcNAQECBQADfgBl3X7hsuyw4jrg7HFGmhkRuNPHoLQDQCYCPgmc4RKz0Vr2N6W3
YQO2WxZpO8ZECAyIUwxrl0nHPjXcbLm7qt9cuzovk2C2qUtN8iD3zV9/ZHuO3ABc
1/p3yjkWWW8O6tO1g39NTUJWdrTJXwT4OPjr0l91X817/OWOgHz8UA==
-----END CERTIFICATE-----
Thawte Personal Basic CA
========================
-----BEGIN CERTIFICATE-----
MIIDITCCAoqgAwIBAgIBADANBgkqhkiG9w0BAQQFADCByzELMAkGA1UEBhMCWkEx
FTATBgNVBAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMRowGAYD
VQQKExFUaGF3dGUgQ29uc3VsdGluZzEoMCYGA1UECxMfQ2VydGlmaWNhdGlvbiBT
ZXJ2aWNlcyBEaXZpc2lvbjEhMB8GA1UEAxMYVGhhd3RlIFBlcnNvbmFsIEJhc2lj
IENBMSgwJgYJKoZIhvcNAQkBFhlwZXJzb25hbC1iYXNpY0B0aGF3dGUuY29tMB4X
DTk2MDEwMTAwMDAwMFoXDTIwMTIzMTIzNTk1OVowgcsxCzAJBgNVBAYTAlpBMRUw
EwYDVQQIEwxXZXN0ZXJuIENhcGUxEjAQBgNVBAcTCUNhcGUgVG93bjEaMBgGA1UE
ChMRVGhhd3RlIENvbnN1bHRpbmcxKDAmBgNVBAsTH0NlcnRpZmljYXRpb24gU2Vy
dmljZXMgRGl2aXNpb24xITAfBgNVBAMTGFRoYXd0ZSBQZXJzb25hbCBCYXNpYyBD
QTEoMCYGCSqGSIb3DQEJARYZcGVyc29uYWwtYmFzaWNAdGhhd3RlLmNvbTCBnzAN
BgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAvLyTU23AUE+CFeZIlDWmWr5vQvoPR+53
dXLdjUmbllegeNTKP1GzaQuRdhciB5dqxFGTS+CN7zeVoQxN2jSQHReJl+A1OFdK
wPQIcOk8RHtQfmGakOMj04gRRif1CwcOu93RfyAKiLlWCy4cgNrx454p7xS9CkT7
G1sY0b8jkyECAwEAAaMTMBEwDwYDVR0TAQH/BAUwAwEB/zANBgkqhkiG9w0BAQQF
AAOBgQAt4plrsD16iddZopQBHyvdEktTwq1/qqcAXJFAVyVKOKqEcLnZgA+le1z7
c8a914phXAPjLSeoF+CEhULcXpvGt7Jtu3Sv5D/Lp7ew4F2+eIMllNLbgQ95B21P
9DkVWlIBe94y1k049hJcBlDfBVu9FEuh3ym6O0GN92NWod8isQ==
-----END CERTIFICATE-----
Thawte Personal Premium CA
==========================
-----BEGIN CERTIFICATE-----
MIIDKTCCApKgAwIBAgIBADANBgkqhkiG9w0BAQQFADCBzzELMAkGA1UEBhMCWkEx
FTATBgNVBAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMRowGAYD
VQQKExFUaGF3dGUgQ29uc3VsdGluZzEoMCYGA1UECxMfQ2VydGlmaWNhdGlvbiBT
ZXJ2aWNlcyBEaXZpc2lvbjEjMCEGA1UEAxMaVGhhd3RlIFBlcnNvbmFsIFByZW1p
dW0gQ0ExKjAoBgkqhkiG9w0BCQEWG3BlcnNvbmFsLXByZW1pdW1AdGhhd3RlLmNv
bTAeFw05NjAxMDEwMDAwMDBaFw0yMDEyMzEyMzU5NTlaMIHPMQswCQYDVQQGEwJa
QTEVMBMGA1UECBMMV2VzdGVybiBDYXBlMRIwEAYDVQQHEwlDYXBlIFRvd24xGjAY
BgNVBAoTEVRoYXd0ZSBDb25zdWx0aW5nMSgwJgYDVQQLEx9DZXJ0aWZpY2F0aW9u
IFNlcnZpY2VzIERpdmlzaW9uMSMwIQYDVQQDExpUaGF3dGUgUGVyc29uYWwgUHJl
bWl1bSBDQTEqMCgGCSqGSIb3DQEJARYbcGVyc29uYWwtcHJlbWl1bUB0aGF3dGUu
Y29tMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDJZtn4B0TPuYwu8KHvE0Vs
Bd/eJxZRNkERbGw77f4QfRKe5ZtCmv5gMcNmt3M6SK5O0DI3lIi1DbbZ8/JE2dWI
Et12TfIa/G8jHnrx2JhFTgcQ7xZC0EN1bUre4qrJMf8fAHB8Zs8QJQi6+u4A6UYD
ZicRFTuqW/KY3TZCstqIdQIDAQABoxMwETAPBgNVHRMBAf8EBTADAQH/MA0GCSqG
SIb3DQEBBAUAA4GBAGk2ifc0KjNyL2071CKyuG+axTZmDhs8obF1Wub9NdP4qPIH
b4Vnjt4rueIXsDqg8A6iAJrf8xQVbrvIhVqYgPn/vnQdPfP+MCXRNzRn+qVxeTBh
KXLA4CxM+1bkOqhv5TJZUtt1KFBZDPgLGeSs2a+WjS9Q2wfD6h+rM+D1KzGJ
-----END CERTIFICATE-----
Thawte Personal Freemail CA
===========================
-----BEGIN CERTIFICATE-----
MIIDLTCCApagAwIBAgIBADANBgkqhkiG9w0BAQQFADCB0TELMAkGA1UEBhMCWkEx
FTATBgNVBAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMRowGAYD
VQQKExFUaGF3dGUgQ29uc3VsdGluZzEoMCYGA1UECxMfQ2VydGlmaWNhdGlvbiBT
ZXJ2aWNlcyBEaXZpc2lvbjEkMCIGA1UEAxMbVGhhd3RlIFBlcnNvbmFsIEZyZWVt
YWlsIENBMSswKQYJKoZIhvcNAQkBFhxwZXJzb25hbC1mcmVlbWFpbEB0aGF3dGUu
Y29tMB4XDTk2MDEwMTAwMDAwMFoXDTIwMTIzMTIzNTk1OVowgdExCzAJBgNVBAYT
AlpBMRUwEwYDVQQIEwxXZXN0ZXJuIENhcGUxEjAQBgNVBAcTCUNhcGUgVG93bjEa
MBgGA1UEChMRVGhhd3RlIENvbnN1bHRpbmcxKDAmBgNVBAsTH0NlcnRpZmljYXRp
b24gU2VydmljZXMgRGl2aXNpb24xJDAiBgNVBAMTG1RoYXd0ZSBQZXJzb25hbCBG
cmVlbWFpbCBDQTErMCkGCSqGSIb3DQEJARYccGVyc29uYWwtZnJlZW1haWxAdGhh
d3RlLmNvbTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEA1GnX1LCUZFtx6UfY
DFG26nKRsIRefS0Nj3sS34UldSh0OkIsYyeflXtL734Zhx2G6qPduc6WZBrCFG5E
rHzmj+hND3EfQDimAKOHePb5lIZererAXnbr2RSjXW56fAylS1V/Bhkpf56aJtVq
uzgkCGqYx7Hao5iR/Xnb5VrEHLkCAwEAAaMTMBEwDwYDVR0TAQH/BAUwAwEB/zAN
BgkqhkiG9w0BAQQFAAOBgQDH7JJ+Tvj1lqVnYiqk8E0RYNBvjWBYYawmu1I1XAjP
MPuoSpaKH2JCI4wXD/S6ZJwXrEcp352YXtJsYHFcoqzceePnbgBHH7UNKOgCneSa
/RP0ptl8sfjcXyMmCZGAc9AUG95DqYMl8uacLxXK/qarigd1iwzdUYRr5PjRznei
gQ==
-----END CERTIFICATE-----
Thawte Server CA
================
-----BEGIN CERTIFICATE-----
MIIDEzCCAnygAwIBAgIBATANBgkqhkiG9w0BAQQFADCBxDELMAkGA1UEBhMCWkEx
FTATBgNVBAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMR0wGwYD
VQQKExRUaGF3dGUgQ29uc3VsdGluZyBjYzEoMCYGA1UECxMfQ2VydGlmaWNhdGlv
biBTZXJ2aWNlcyBEaXZpc2lvbjEZMBcGA1UEAxMQVGhhd3RlIFNlcnZlciBDQTEm
MCQGCSqGSIb3DQEJARYXc2VydmVyLWNlcnRzQHRoYXd0ZS5jb20wHhcNOTYwODAx
MDAwMDAwWhcNMjAxMjMxMjM1OTU5WjCBxDELMAkGA1UEBhMCWkExFTATBgNVBAgT
DFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMR0wGwYDVQQKExRUaGF3
dGUgQ29uc3VsdGluZyBjYzEoMCYGA1UECxMfQ2VydGlmaWNhdGlvbiBTZXJ2aWNl
cyBEaXZpc2lvbjEZMBcGA1UEAxMQVGhhd3RlIFNlcnZlciBDQTEmMCQGCSqGSIb3
DQEJARYXc2VydmVyLWNlcnRzQHRoYXd0ZS5jb20wgZ8wDQYJKoZIhvcNAQEBBQAD
gY0AMIGJAoGBANOkUG7I/1Zr5s9dtuoMaHVHoqrC2oQl/Kj0R1HahbUgdJSGHg91
yekIYfUGbTBuFRkC6VLAYttNmZ7iagxEOM3+vuNkCXDF/rFrKbYvScg71CcEJRCX
L+eQbcAoQpnXTEPew/UhbVSfXcNY4cDk2VuwuNy0e982OsK1ZiIS1ocNAgMBAAGj
EzARMA8GA1UdEwEB/wQFMAMBAf8wDQYJKoZIhvcNAQEEBQADgYEAB/pMaVz7lcxG
7oWDTSEwjsrZqG9JGubaUeNgcGyEYRGhGshIPllDfU+VPaGLtwtimHp1it2ITk6e
QNuozDJ0uW8NxuOzRAvZim+aKZuZGCg70eNAKJpaPNW15yAbi8qkq43pUdniTCxZ
qdq5snUb9kLy78fyGPmJvKP/iiMucEc=
-----END CERTIFICATE-----
Thawte Premium Server CA
========================
-----BEGIN CERTIFICATE-----
MIIDJzCCApCgAwIBAgIBATANBgkqhkiG9w0BAQQFADCBzjELMAkGA1UEBhMCWkEx
FTATBgNVBAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMR0wGwYD
VQQKExRUaGF3dGUgQ29uc3VsdGluZyBjYzEoMCYGA1UECxMfQ2VydGlmaWNhdGlv
biBTZXJ2aWNlcyBEaXZpc2lvbjEhMB8GA1UEAxMYVGhhd3RlIFByZW1pdW0gU2Vy
dmVyIENBMSgwJgYJKoZIhvcNAQkBFhlwcmVtaXVtLXNlcnZlckB0aGF3dGUuY29t
MB4XDTk2MDgwMTAwMDAwMFoXDTIwMTIzMTIzNTk1OVowgc4xCzAJBgNVBAYTAlpB
MRUwEwYDVQQIEwxXZXN0ZXJuIENhcGUxEjAQBgNVBAcTCUNhcGUgVG93bjEdMBsG
A1UEChMUVGhhd3RlIENvbnN1bHRpbmcgY2MxKDAmBgNVBAsTH0NlcnRpZmljYXRp
b24gU2VydmljZXMgRGl2aXNpb24xITAfBgNVBAMTGFRoYXd0ZSBQcmVtaXVtIFNl
cnZlciBDQTEoMCYGCSqGSIb3DQEJARYZcHJlbWl1bS1zZXJ2ZXJAdGhhd3RlLmNv
bTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEA0jY2aovXwlue2oFBYo847kkE
VdbQ7xwblRZH7xhINTpS9CtqBo87L+pW46+GjZ4X9560ZXUCTe/LCaIhUdib0GfQ
ug2SBhRz1JPLlyoAnFxODLz6FVL88kRu2hFKbgifLy3j+ao6hnO2RlNYyIkFvYMR
uHM/qgeN9EJN50CdHDcCAwEAAaMTMBEwDwYDVR0TAQH/BAUwAwEB/zANBgkqhkiG
9w0BAQQFAAOBgQAmSCwWwlj66BZ0DKqqX1Q/8tfJeGBeXm43YyJ3Nn6yF8Q0ufUI
hfzJATj/Tb7yFkJD57taRvvBxhEf8UqwKEbJw8RCfbz6q1lu1bdRiBHjpIUZa4JM
pAwSremkrj/xw0llmozFyD4lt5SZu5IycQfwhl7tUCemDaYj+bvLpgcUQg==
-----END CERTIFICATE-----
Equifax Secure CA
=================
-----BEGIN CERTIFICATE-----
MIIDIDCCAomgAwIBAgIENd70zzANBgkqhkiG9w0BAQUFADBOMQswCQYDVQQGEwJV
UzEQMA4GA1UEChMHRXF1aWZheDEtMCsGA1UECxMkRXF1aWZheCBTZWN1cmUgQ2Vy
dGlmaWNhdGUgQXV0aG9yaXR5MB4XDTk4MDgyMjE2NDE1MVoXDTE4MDgyMjE2NDE1
MVowTjELMAkGA1UEBhMCVVMxEDAOBgNVBAoTB0VxdWlmYXgxLTArBgNVBAsTJEVx
dWlmYXggU2VjdXJlIENlcnRpZmljYXRlIEF1dGhvcml0eTCBnzANBgkqhkiG9w0B
AQEFAAOBjQAwgYkCgYEAwV2xWGcIYu6gmi0fCG2RFGiYCh7+2gRvE4RiIcPRfM6f
BeC4AfBONOziipUEZKzxa1NfBbPLZ4C/QgKO/t0BCezhABRP/PvwDN1Dulsr4R+A
cJkVV5MW8Q+XarfCaCMczE1ZMKxRHjuvK9buY0V7xdlfUNLjUA86iOe/FP3gx7kC
AwEAAaOCAQkwggEFMHAGA1UdHwRpMGcwZaBjoGGkXzBdMQswCQYDVQQGEwJVUzEQ
MA4GA1UEChMHRXF1aWZheDEtMCsGA1UECxMkRXF1aWZheCBTZWN1cmUgQ2VydGlm
aWNhdGUgQXV0aG9yaXR5MQ0wCwYDVQQDEwRDUkwxMBoGA1UdEAQTMBGBDzIwMTgw
ODIyMTY0MTUxWjALBgNVHQ8EBAMCAQYwHwYDVR0jBBgwFoAUSOZo+SvSspXXR9gj
IBBPM5iQn9QwHQYDVR0OBBYEFEjmaPkr0rKV10fYIyAQTzOYkJ/UMAwGA1UdEwQF
MAMBAf8wGgYJKoZIhvZ9B0EABA0wCxsFVjMuMGMDAgbAMA0GCSqGSIb3DQEBBQUA
A4GBAFjOKer89961zgK5F7WF0bnj4JXMJTENAKaSbn+2kmOeUJXRmm/kEd5jhW6Y
7qj/WsjTVbJmcVfewCHrPSqnI0kBBIZCe/zuf6IWUrVnZ9NA2zsmWLIodz2uFHdh
1voqZiegDfqnc1zqcPGUIWVEX/r87yloqaKHee9570+sB3c4
-----END CERTIFICATE-----
Verisign Class 1 Public Primary Certification Authority
=======================================================
-----BEGIN CERTIFICATE-----
MIICPTCCAaYCEQDNun9W8N/kvFT+IqyzcqpVMA0GCSqGSIb3DQEBAgUAMF8xCzAJ
BgNVBAYTAlVTMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjE3MDUGA1UECxMuQ2xh
c3MgMSBQdWJsaWMgUHJpbWFyeSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTAeFw05
NjAxMjkwMDAwMDBaFw0yODA4MDEyMzU5NTlaMF8xCzAJBgNVBAYTAlVTMRcwFQYD
VQQKEw5WZXJpU2lnbiwgSW5jLjE3MDUGA1UECxMuQ2xhc3MgMSBQdWJsaWMgUHJp
bWFyeSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTCBnzANBgkqhkiG9w0BAQEFAAOB
jQAwgYkCgYEA5Rm/baNWYS2ZSHH2Z965jeu3noaACpEO+jglr0aIguVzqKCbJF0N
H8xlbgyw0FaEGIeaBpsQoXPftFg5a27B9hXVqKg/qhIGjTGsf7A01480Z4gJzRQR
4k5FVmkfeAKA2txHkSm7NsljXMXg1y2He6G3MrB7MLoqLzGq7qNn2tsCAwEAATAN
BgkqhkiG9w0BAQIFAAOBgQBMP7iLxmjf7kMzDl3ppssHhE16M/+SG/Q2rdiVIjZo
EWx8QszznC7EBz8UsA9P/5CSdvnivErpj82ggAr3xSnxgiJduLHdgSOjeyUVRjB5
FvjqBUuUfx3CHMjjt/QQQDwTw18fU+hI5Ia0e6E1sHslurjTjqs/OJ0ANACY89Fx
lA==
-----END CERTIFICATE-----
Verisign Class 2 Public Primary Certification Authority
=======================================================
-----BEGIN CERTIFICATE-----
MIICPDCCAaUCEC0b/EoXjaOR6+f/9YtFvgswDQYJKoZIhvcNAQECBQAwXzELMAkG
A1UEBhMCVVMxFzAVBgNVBAoTDlZlcmlTaWduLCBJbmMuMTcwNQYDVQQLEy5DbGFz
cyAyIFB1YmxpYyBQcmltYXJ5IENlcnRpZmljYXRpb24gQXV0aG9yaXR5MB4XDTk2
MDEyOTAwMDAwMFoXDTI4MDgwMTIzNTk1OVowXzELMAkGA1UEBhMCVVMxFzAVBgNV
BAoTDlZlcmlTaWduLCBJbmMuMTcwNQYDVQQLEy5DbGFzcyAyIFB1YmxpYyBQcmlt
YXJ5IENlcnRpZmljYXRpb24gQXV0aG9yaXR5MIGfMA0GCSqGSIb3DQEBAQUAA4GN
ADCBiQKBgQC2WoujDWojg4BrzzmH9CETMwZMJaLtVRKXxaeAufqDwSCg+i8VDXyh
YGt+eSz6Bg86rvYbb7HS/y8oUl+DfUvEerf4Zh+AVPy3wo5ZShRXRtGak75BkQO7
FYCTXOvnzAhsPz6zSvz/S2wj1VCCJkQZjiPDceoZJEcEnnW/yKYAHwIDAQABMA0G
CSqGSIb3DQEBAgUAA4GBAIobK/o5wXTXXtgZZKJYSi034DNHD6zt96rbHuSLBlxg
J8pFUs4W7z8GZOeUaHxgMxURaa+dYo2jA1Rrpr7l7gUYYAS/QoD90KioHgE796Nc
r6Pc5iaAIzy4RHT3Cq5Ji2F4zCS/iIqnDupzGUH9TQPwiNHleI2lKk/2lw0Xd8rY
-----END CERTIFICATE-----
Verisign Class 3 Public Primary Certification Authority
=======================================================
-----BEGIN CERTIFICATE-----
MIICPDCCAaUCEHC65B0Q2Sk0tjjKewPMur8wDQYJKoZIhvcNAQECBQAwXzELMAkG
A1UEBhMCVVMxFzAVBgNVBAoTDlZlcmlTaWduLCBJbmMuMTcwNQYDVQQLEy5DbGFz
cyAzIFB1YmxpYyBQcmltYXJ5IENlcnRpZmljYXRpb24gQXV0aG9yaXR5MB4XDTk2
MDEyOTAwMDAwMFoXDTI4MDgwMTIzNTk1OVowXzELMAkGA1UEBhMCVVMxFzAVBgNV
BAoTDlZlcmlTaWduLCBJbmMuMTcwNQYDVQQLEy5DbGFzcyAzIFB1YmxpYyBQcmlt
YXJ5IENlcnRpZmljYXRpb24gQXV0aG9yaXR5MIGfMA0GCSqGSIb3DQEBAQUAA4GN
ADCBiQKBgQDJXFme8huKARS0EN8EQNvjV69qRUCPhAwL0TPZ2RHP7gJYHyX3KqhE
BarsAx94f56TuZoAqiN91qyFomNFx3InzPRMxnVx0jnvT0Lwdd8KkMaOIG+YD/is
I19wKTakyYbnsZogy1Olhec9vn2a/iRFM9x2Fe0PonFkTGUugWhFpwIDAQABMA0G
CSqGSIb3DQEBAgUAA4GBALtMEivPLCYATxQT3ab7/AoRhIzzKBxnki98tsX63/Do
lbwdj2wsqFHMc9ikwFPwTtYmwHYBV4GSXiHx0bH/59AhWM1pF+NEHJwZRDmJXNyc
AA9WjQKZ7aKQRUzkuxCkPfAyAw7xzvjoyVGM5mKf5p/AfbdynMk2OmufTqj/ZA1k
-----END CERTIFICATE-----
Verisign Class 1 Public Primary Certification Authority - G2
============================================================
-----BEGIN CERTIFICATE-----
MIIDAjCCAmsCEEzH6qqYPnHTkxD4PTqJkZIwDQYJKoZIhvcNAQEFBQAwgcExCzAJ
BgNVBAYTAlVTMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjE8MDoGA1UECxMzQ2xh
c3MgMSBQdWJsaWMgUHJpbWFyeSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eSAtIEcy
MTowOAYDVQQLEzEoYykgMTk5OCBWZXJpU2lnbiwgSW5jLiAtIEZvciBhdXRob3Jp
emVkIHVzZSBvbmx5MR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMB4X
DTk4MDUxODAwMDAwMFoXDTI4MDgwMTIzNTk1OVowgcExCzAJBgNVBAYTAlVTMRcw
FQYDVQQKEw5WZXJpU2lnbiwgSW5jLjE8MDoGA1UECxMzQ2xhc3MgMSBQdWJsaWMg
UHJpbWFyeSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eSAtIEcyMTowOAYDVQQLEzEo
YykgMTk5OCBWZXJpU2lnbiwgSW5jLiAtIEZvciBhdXRob3JpemVkIHVzZSBvbmx5
MR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMIGfMA0GCSqGSIb3DQEB
AQUAA4GNADCBiQKBgQCq0Lq+Fi24g9TK0g+8djHKlNgdk4xWArzZbxpvUjZudVYK
VdPfQ4chEWWKfo+9Id5rMj8bhDSVBZ1BNeuS65bdqlk/AVNtmU/t5eIqWpDBucSm
Fc/IReumXY6cPvBkJHalzasab7bYe1FhbqZ/h8jit+U03EGI6glAvnOSPWvndQID
AQABMA0GCSqGSIb3DQEBBQUAA4GBAKlPww3HZ74sy9mozS11534Vnjty637rXC0J
h9ZrbWB85a7FkCMMXErQr7Fd88e2CtvgFZMN3QO8x3aKtd1Pw5sTdbgBwObJW2ul
uIncrKTdcu1OofdPvAbT6shkdHvClUGcZXNY8ZCaPGqxmMnEh7zPRW1F4m4iP/68
DzFc6PLZ
-----END CERTIFICATE-----
Verisign Class 2 Public Primary Certification Authority - G2
============================================================
-----BEGIN CERTIFICATE-----
MIIDAzCCAmwCEQC5L2DMiJ+hekYJuFtwbIqvMA0GCSqGSIb3DQEBBQUAMIHBMQsw
CQYDVQQGEwJVUzEXMBUGA1UEChMOVmVyaVNpZ24sIEluYy4xPDA6BgNVBAsTM0Ns
YXNzIDIgUHVibGljIFByaW1hcnkgQ2VydGlmaWNhdGlvbiBBdXRob3JpdHkgLSBH
MjE6MDgGA1UECxMxKGMpIDE5OTggVmVyaVNpZ24sIEluYy4gLSBGb3IgYXV0aG9y
aXplZCB1c2Ugb25seTEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29yazAe
Fw05ODA1MTgwMDAwMDBaFw0yODA4MDEyMzU5NTlaMIHBMQswCQYDVQQGEwJVUzEX
MBUGA1UEChMOVmVyaVNpZ24sIEluYy4xPDA6BgNVBAsTM0NsYXNzIDIgUHVibGlj
IFByaW1hcnkgQ2VydGlmaWNhdGlvbiBBdXRob3JpdHkgLSBHMjE6MDgGA1UECxMx
KGMpIDE5OTggVmVyaVNpZ24sIEluYy4gLSBGb3IgYXV0aG9yaXplZCB1c2Ugb25s
eTEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29yazCBnzANBgkqhkiG9w0B
AQEFAAOBjQAwgYkCgYEAp4gBIXQs5xoD8JjhlzwPIQjxnNuX6Zr8wgQGE75fUsjM
HiwSViy4AWkszJkfrbCWrnkE8hM5wXuYuggs6MKEEyyqaekJ9MepAqRCwiNPStjw
DqL7MWzJ5m+ZJwf15vRMeJ5t60aG+rmGyVTyssSv1EYcWskVMP8NbPUtDm3Of3cC
AwEAATANBgkqhkiG9w0BAQUFAAOBgQByLvl/0fFx+8Se9sVeUYpAmLho+Jscg9ji
nb3/7aHmZuovCfTK1+qlK5X2JGCGTUQug6XELaDTrnhpb3LabK4I8GOSN+a7xDAX
rXfMSTWqz9iP0b63GJZHc2pUIjRkLbYWm1lbtFFZOrMLFPQS32eg9K0yZF6xRnIn
jBJ7xUS0rg==
-----END CERTIFICATE-----
Verisign Class 3 Public Primary Certification Authority - G2
============================================================
-----BEGIN CERTIFICATE-----
MIIDAjCCAmsCEH3Z/gfPqB63EHln+6eJNMYwDQYJKoZIhvcNAQEFBQAwgcExCzAJ
BgNVBAYTAlVTMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjE8MDoGA1UECxMzQ2xh
c3MgMyBQdWJsaWMgUHJpbWFyeSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eSAtIEcy
MTowOAYDVQQLEzEoYykgMTk5OCBWZXJpU2lnbiwgSW5jLiAtIEZvciBhdXRob3Jp
emVkIHVzZSBvbmx5MR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMB4X
DTk4MDUxODAwMDAwMFoXDTI4MDgwMTIzNTk1OVowgcExCzAJBgNVBAYTAlVTMRcw
FQYDVQQKEw5WZXJpU2lnbiwgSW5jLjE8MDoGA1UECxMzQ2xhc3MgMyBQdWJsaWMg
UHJpbWFyeSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eSAtIEcyMTowOAYDVQQLEzEo
YykgMTk5OCBWZXJpU2lnbiwgSW5jLiAtIEZvciBhdXRob3JpemVkIHVzZSBvbmx5
MR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMIGfMA0GCSqGSIb3DQEB
AQUAA4GNADCBiQKBgQDMXtERXVxp0KvTuWpMmR9ZmDCOFoUgRm1HP9SFIIThbbP4
pO0M8RcPO/mn+SXXwc+EY/J8Y8+iR/LGWzOOZEAEaMGAuWQcRXfH2G71lSk8UOg0
13gfqLptQ5GVj0VXXn7F+8qkBOvqlzdUMG+7AUcyM83cV5tkaWH4mx0ciU9cZwID
AQABMA0GCSqGSIb3DQEBBQUAA4GBAFFNzb5cy5gZnBWyATl4Lk0PZ3BwmcYQWpSk
U01UbSuvDV1Ai2TT1+7eVmGSX6bEHRBhNtMsJzzoKQm5EWR0zLVznxxIqbxhAe7i
F6YM40AIOw7n60RzKprxaZLvcRTDOaxxp5EJb+RxBrO6WVcmeQD2+A2iMzAo1KpY
oJ2daZH9
-----END CERTIFICATE-----
Verisign Class 4 Public Primary Certification Authority - G2
============================================================
-----BEGIN CERTIFICATE-----
MIIDAjCCAmsCEDKIjprS9esTR/h/xCA3JfgwDQYJKoZIhvcNAQEFBQAwgcExCzAJ
BgNVBAYTAlVTMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjE8MDoGA1UECxMzQ2xh
c3MgNCBQdWJsaWMgUHJpbWFyeSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eSAtIEcy
MTowOAYDVQQLEzEoYykgMTk5OCBWZXJpU2lnbiwgSW5jLiAtIEZvciBhdXRob3Jp
emVkIHVzZSBvbmx5MR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMB4X
DTk4MDUxODAwMDAwMFoXDTI4MDgwMTIzNTk1OVowgcExCzAJBgNVBAYTAlVTMRcw
FQYDVQQKEw5WZXJpU2lnbiwgSW5jLjE8MDoGA1UECxMzQ2xhc3MgNCBQdWJsaWMg
UHJpbWFyeSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eSAtIEcyMTowOAYDVQQLEzEo
YykgMTk5OCBWZXJpU2lnbiwgSW5jLiAtIEZvciBhdXRob3JpemVkIHVzZSBvbmx5
MR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMIGfMA0GCSqGSIb3DQEB
AQUAA4GNADCBiQKBgQC68OTP+cSuhVS5B1f5j8V/aBH4xBewRNzjMHPVKmIquNDM
HO0oW369atyzkSTKQWI8/AIBvxwWMZQFl3Zuoq29YRdsTjCG8FE3KlDHqGKB3FtK
qsGgtG7rL+VXxbErQHDbWk2hjh+9Ax/YA9SPTJlxvOKCzFjomDqG04Y48wApHwID
AQABMA0GCSqGSIb3DQEBBQUAA4GBAIWMEsGnuVAVess+rLhDityq3RS6iYF+ATwj
cSGIL4LcY/oCRaxFWdcqWERbt5+BO5JoPeI3JPV7bI92NZYJqFmduc4jq3TWg/0y
cyfYaT5DdPauxYma51N86Xv2S/PBZYPejYqcPIiNOVn8qj8ijaHBZlCBckztImRP
T8qAkbYp
-----END CERTIFICATE-----
Verisign Class 1 Public Primary Certification Authority - G3
============================================================
-----BEGIN CERTIFICATE-----
MIIEGjCCAwICEQCLW3VWhFSFCwDPrzhIzrGkMA0GCSqGSIb3DQEBBQUAMIHKMQsw
CQYDVQQGEwJVUzEXMBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZl
cmlTaWduIFRydXN0IE5ldHdvcmsxOjA4BgNVBAsTMShjKSAxOTk5IFZlcmlTaWdu
LCBJbmMuIC0gRm9yIGF1dGhvcml6ZWQgdXNlIG9ubHkxRTBDBgNVBAMTPFZlcmlT
aWduIENsYXNzIDEgUHVibGljIFByaW1hcnkgQ2VydGlmaWNhdGlvbiBBdXRob3Jp
dHkgLSBHMzAeFw05OTEwMDEwMDAwMDBaFw0zNjA3MTYyMzU5NTlaMIHKMQswCQYD
VQQGEwJVUzEXMBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlT
aWduIFRydXN0IE5ldHdvcmsxOjA4BgNVBAsTMShjKSAxOTk5IFZlcmlTaWduLCBJ
bmMuIC0gRm9yIGF1dGhvcml6ZWQgdXNlIG9ubHkxRTBDBgNVBAMTPFZlcmlTaWdu
IENsYXNzIDEgUHVibGljIFByaW1hcnkgQ2VydGlmaWNhdGlvbiBBdXRob3JpdHkg
LSBHMzCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAN2E1Lm0+afY8wR4
nN493GwTFtl63SRRZsDHJlkNrAYIwpTRMx/wgzUfbhvI3qpuFU5UJ+/EbRrsC+MO
8ESlV8dAWB6jRx9x7GD2bZTIGDnt/kIYVt/kTEkQeE4BdjVjEjbdZrwBBDajVWjV
ojYJrKshJlQGrT/KFOCsyq0GHZXi+J3x4GD/wn91K0zM2v6HmSHquv4+VNfSWXjb
PG7PoBMAGrgnoeS+Z5bKoMWznN3JdZ7rMJpfo83ZrngZPyPpXNspva1VyBtUjGP2
6KbqxzcSXKMpHgLZ2x87tNcPVkeBFQRKr4Mn0cVYiMHd9qqnoxjaaKptEVHhv2Vr
n5Z20T0CAwEAATANBgkqhkiG9w0BAQUFAAOCAQEAq2aN17O6x5q25lXQBfGfMY1a
qtmqRiYPce2lrVNWYgFHKkTp/j90CxObufRNG7LRX7K20ohcs5/Ny9Sn2WCVhDr4
wTcdYcrnsMXlkdpUpqwxga6X3s0IrLjAl4B/bnKk52kTlWUfxJM8/XmPBNQ+T+r3
ns7NZ3xPZQL/kYVUc8f/NveGLezQXk//EZ9yBta4GvFMDSZl4kSAHsef493oCtrs
pSCAaWihT37ha88HQfqDjrw43bAuEbFrskLMmrz5SCJ5ShkPshw+IHTZasO+8ih4
E1Z5T21Q6huwtVexN2ZYI/PcD98Kh8TvhgXVOBRgmaNL3gaWcSzy27YfpO8/7g==
-----END CERTIFICATE-----
Verisign Class 2 Public Primary Certification Authority - G3
============================================================
-----BEGIN CERTIFICATE-----
MIIEGTCCAwECEGFwy0mMX5hFKeewptlQW3owDQYJKoZIhvcNAQEFBQAwgcoxCzAJ
BgNVBAYTAlVTMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVy
aVNpZ24gVHJ1c3QgTmV0d29yazE6MDgGA1UECxMxKGMpIDE5OTkgVmVyaVNpZ24s
IEluYy4gLSBGb3IgYXV0aG9yaXplZCB1c2Ugb25seTFFMEMGA1UEAxM8VmVyaVNp
Z24gQ2xhc3MgMiBQdWJsaWMgUHJpbWFyeSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0
eSAtIEczMB4XDTk5MTAwMTAwMDAwMFoXDTM2MDcxNjIzNTk1OVowgcoxCzAJBgNV
BAYTAlVTMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNp
Z24gVHJ1c3QgTmV0d29yazE6MDgGA1UECxMxKGMpIDE5OTkgVmVyaVNpZ24sIElu
Yy4gLSBGb3IgYXV0aG9yaXplZCB1c2Ugb25seTFFMEMGA1UEAxM8VmVyaVNpZ24g
Q2xhc3MgMiBQdWJsaWMgUHJpbWFyeSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eSAt
IEczMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEArwoNwtUs22e5LeWU
J92lvuCwTY+zYVY81nzD9M0+hsuiiOLh2KRpxbXiv8GmR1BeRjmL1Za6tW8UvxDO
JxOeBUebMXoT2B/Z0wI3i60sR/COgQanDTAM6/c8DyAd3HJG7qUCyFvDyVZpTMUY
wZF7C9UTAJu878NIPkZgIIUq1ZC2zYugzDLdt/1AVbJQHFauzI13TccgTacxdu9o
koqQHgiBVrKtaaNS0MscxCM9H5n+TOgWY47GCI72MfbS+uV23bUckqNJzc0BzWjN
qWm6o+sdDZykIKbBoMXRRkwXbdKsZj+WjOCE1Db/IlnF+RFgqF8EffIa9iVCYQ/E
Srg+iQIDAQABMA0GCSqGSIb3DQEBBQUAA4IBAQA0JhU8wI1NQ0kdvekhktdmnLfe
xbjQ5F1fdiLAJvmEOjr5jLX77GDx6M4EsMjdpwOPMPOY36TmpDHf0xwLRtxyID+u
7gU8pDM/CzmscHhzS5kr3zDCVLCoO1Wh/hYozUK9dG6A2ydEp85EXdQbkJgNHkKU
sQAsBNB0owIFImNjzYO1+8FtYmtpdf1dcEG59b98377BMnMiIYtYgXsVkXq642RI
sH/7NiXaldDxJBQX3RiAa0YjOVT1jmIJBB2UkKab5iXiQkWquJCtvgiPqQtCGJTP
cjnhsUPgKM+351psE2tJs//jGHyJizNdrDPXp/naOlXJWBD5qu9ats9LS98q
-----END CERTIFICATE-----
Verisign Class 3 Public Primary Certification Authority - G3
============================================================
-----BEGIN CERTIFICATE-----
MIIEGjCCAwICEQCbfgZJoz5iudXukEhxKe9XMA0GCSqGSIb3DQEBBQUAMIHKMQsw
CQYDVQQGEwJVUzEXMBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZl
cmlTaWduIFRydXN0IE5ldHdvcmsxOjA4BgNVBAsTMShjKSAxOTk5IFZlcmlTaWdu
LCBJbmMuIC0gRm9yIGF1dGhvcml6ZWQgdXNlIG9ubHkxRTBDBgNVBAMTPFZlcmlT
aWduIENsYXNzIDMgUHVibGljIFByaW1hcnkgQ2VydGlmaWNhdGlvbiBBdXRob3Jp
dHkgLSBHMzAeFw05OTEwMDEwMDAwMDBaFw0zNjA3MTYyMzU5NTlaMIHKMQswCQYD
VQQGEwJVUzEXMBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlT
aWduIFRydXN0IE5ldHdvcmsxOjA4BgNVBAsTMShjKSAxOTk5IFZlcmlTaWduLCBJ
bmMuIC0gRm9yIGF1dGhvcml6ZWQgdXNlIG9ubHkxRTBDBgNVBAMTPFZlcmlTaWdu
IENsYXNzIDMgUHVibGljIFByaW1hcnkgQ2VydGlmaWNhdGlvbiBBdXRob3JpdHkg
LSBHMzCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAMu6nFL8eB8aHm8b
N3O9+MlrlBIwT/A2R/XQkQr1F8ilYcEWQE37imGQ5XYgwREGfassbqb1EUGO+i2t
KmFZpGcmTNDovFJbcCAEWNF6yaRpvIMXZK0Fi7zQWM6NjPXr8EJJC52XJ2cybuGu
kxUccLwgTS8Y3pKI6GyFVxEa6X7jJhFUokWWVYPKMIno3Nij7SqAP395ZVc+FSBm
CC+Vk7+qRy+oRpfwEuL+wgorUeZ25rdGt+INpsyow0xZVYnm6FNcHOqd8GIWC6fJ
Xwzw3sJ2zq/3avL6QaaiMxTJ5Xpj055iN9WFZZ4O5lMkdBteHRJTW8cs54NJOxWu
imi5V5cCAwEAATANBgkqhkiG9w0BAQUFAAOCAQEAERSWwauSCPc/L8my/uRan2Te
2yFPhpk0djZX3dAVL8WtfxUfN2JzPtTnX84XA9s1+ivbrmAJXx5fj267Cz3qWhMe
DGBvtcC1IyIuBwvLqXTLR7sdwdela8wv0kL9Sd2nic9TutoAWii/gt/4uhMdUIaC
/Y4wjylGsB49Ndo4YhYYSq3mtlFs3q9i6wHQHiT+eo8SGhJouPtmmRQURVyu565p
F4ErWjfJXir0xuKhXFSbplQAz/DxwceYMBo7Nhbbo27q/a2ywtrvAkcTisDxszGt
TxzhT5yvDwyd93gN2PQ1VoDat20Xj50egWTh/sVFuq1ruQp6Tk9LhO5L8X3dEQ==
-----END CERTIFICATE-----
Verisign Class 4 Public Primary Certification Authority - G3
============================================================
-----BEGIN CERTIFICATE-----
MIIEGjCCAwICEQDsoKeLbnVqAc/EfMwvlF7XMA0GCSqGSIb3DQEBBQUAMIHKMQsw
CQYDVQQGEwJVUzEXMBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZl
cmlTaWduIFRydXN0IE5ldHdvcmsxOjA4BgNVBAsTMShjKSAxOTk5IFZlcmlTaWdu
LCBJbmMuIC0gRm9yIGF1dGhvcml6ZWQgdXNlIG9ubHkxRTBDBgNVBAMTPFZlcmlT
aWduIENsYXNzIDQgUHVibGljIFByaW1hcnkgQ2VydGlmaWNhdGlvbiBBdXRob3Jp
dHkgLSBHMzAeFw05OTEwMDEwMDAwMDBaFw0zNjA3MTYyMzU5NTlaMIHKMQswCQYD
VQQGEwJVUzEXMBUGA1UEChMOVmVyaVNpZ24sIEluYy4xHzAdBgNVBAsTFlZlcmlT
aWduIFRydXN0IE5ldHdvcmsxOjA4BgNVBAsTMShjKSAxOTk5IFZlcmlTaWduLCBJ
bmMuIC0gRm9yIGF1dGhvcml6ZWQgdXNlIG9ubHkxRTBDBgNVBAMTPFZlcmlTaWdu
IENsYXNzIDQgUHVibGljIFByaW1hcnkgQ2VydGlmaWNhdGlvbiBBdXRob3JpdHkg
LSBHMzCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAK3LpRFpxlmr8Y+1
GQ9Wzsy1HyDkniYlS+BzZYlZ3tCD5PUPtbut8XzoIfzk6AzufEUiGXaStBO3IFsJ
+mGuqPKljYXCKtbeZjbSmwL0qJJgfJxptI8kHtCGUvYynEFYHiK9zUVilQhu0Gbd
U6LM8BDcVHOLBKFGMzNcF0C5nk3T875Vg+ixiY5afJqWIpA7iCXy0lOIAgwLePLm
NxdLMEYH5IBtptiWLugs+BGzOA1mppvqySNb247i8xOOGlktqgLw7KSHZtzBP/XY
ufTsgsbSPZUd5cBPhMnZo0QoBmrXRazwa2rvTl/4EYIeOGM0ZlDUPpNz+jDDZq3/
ky2X7wMCAwEAATANBgkqhkiG9w0BAQUFAAOCAQEAj/ola09b5KROJ1WrIhVZPMq1
CtRK26vdoV9TxaBXOcLORyu+OshWv8LZJxA6sQU8wHcxuzrTBXttmhwwjIDLk5Mq
g6sFUYICABFna/OIYUdfA5PVWw3g8dShMjWFsjrbsIKr0csKvE+MW8VLADsfKoKm
fjaF3H48ZwC15DtS4KjrXRX5xm3wrR0OhbepmnMUWluPQSjA1egtTaRezarZ7c7c
2NU8Qh0XwRJdRTjDOPP8hS6DRkiy1yBfkjaP53kPmF6Z6PDQpLv1U70qzlmwr25/
bLvSHgCwIe34QWKCudiyxLtGUPMxxY8BqHTr9Xgn2uf3ZkPznoM+IKrDNWCRzg==
-----END CERTIFICATE-----
Equifax Secure Global eBusiness CA
==================================
-----BEGIN CERTIFICATE-----
MIICkDCCAfmgAwIBAgIBATANBgkqhkiG9w0BAQQFADBaMQswCQYDVQQGEwJVUzEc
MBoGA1UEChMTRXF1aWZheCBTZWN1cmUgSW5jLjEtMCsGA1UEAxMkRXF1aWZheCBT
ZWN1cmUgR2xvYmFsIGVCdXNpbmVzcyBDQS0xMB4XDTk5MDYyMTA0MDAwMFoXDTIw
MDYyMTA0MDAwMFowWjELMAkGA1UEBhMCVVMxHDAaBgNVBAoTE0VxdWlmYXggU2Vj
dXJlIEluYy4xLTArBgNVBAMTJEVxdWlmYXggU2VjdXJlIEdsb2JhbCBlQnVzaW5l
c3MgQ0EtMTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAuucXkAJlsTRVPEnC
UdXfp9E3j9HngXNBUmCbnaEXJnitx7HoJpQytd4zjTov2/KaelpzmKNc6fuKcxtc
58O/gGzNqfTWK8D3+ZmqY6KxRwIP1ORROhI8bIpaVIRw28HFkM9yRcuoWcDNM50/
o5brhTMhHD4ePmBudpxnhcXIw2ECAwEAAaNmMGQwEQYJYIZIAYb4QgEBBAQDAgAH
MA8GA1UdEwEB/wQFMAMBAf8wHwYDVR0jBBgwFoAUvqigdHJQa0S3ySPY+6j/s1dr
aGwwHQYDVR0OBBYEFL6ooHRyUGtEt8kj2Puo/7NXa2hsMA0GCSqGSIb3DQEBBAUA
A4GBADDiAVGqx+pf2rnQZQ8w1j7aDRRJbpGTJxQx78T3LUX47Me/okENI7SS+RkA
Z70Br83gcfxaz2TE4JaY0KNA4gGK7ycH8WUBikQtBmV1UsCGECAhX2xrD2yuCRyv
8qIYNMR1pHMc8Y3c7635s3a0kr/clRAevsvIO1qEYBlWlKlV
-----END CERTIFICATE-----
Equifax Secure eBusiness CA 1
=============================
-----BEGIN CERTIFICATE-----
MIICgjCCAeugAwIBAgIBBDANBgkqhkiG9w0BAQQFADBTMQswCQYDVQQGEwJVUzEc
MBoGA1UEChMTRXF1aWZheCBTZWN1cmUgSW5jLjEmMCQGA1UEAxMdRXF1aWZheCBT
ZWN1cmUgZUJ1c2luZXNzIENBLTEwHhcNOTkwNjIxMDQwMDAwWhcNMjAwNjIxMDQw
MDAwWjBTMQswCQYDVQQGEwJVUzEcMBoGA1UEChMTRXF1aWZheCBTZWN1cmUgSW5j
LjEmMCQGA1UEAxMdRXF1aWZheCBTZWN1cmUgZUJ1c2luZXNzIENBLTEwgZ8wDQYJ
KoZIhvcNAQEBBQADgY0AMIGJAoGBAM4vGbwXt3fek6lfWg0XTzQaDJj0ItlZ1MRo
RvC0NcWFAyDGr0WlIVFFQesWWDYyb+JQYmT5/VGcqiTZ9J2DKocKIdMSODRsjQBu
WqDZQu4aIZX5UkxVWsUPOE9G+m34LjXWHXzr4vCwdYDIqROsvojvOm6rXyo4YgKw
Env+j6YDAgMBAAGjZjBkMBEGCWCGSAGG+EIBAQQEAwIABzAPBgNVHRMBAf8EBTAD
AQH/MB8GA1UdIwQYMBaAFEp4MlIR21kWNl7fwRQ2QGpHfEyhMB0GA1UdDgQWBBRK
eDJSEdtZFjZe38EUNkBqR3xMoTANBgkqhkiG9w0BAQQFAAOBgQB1W6ibAxHm6VZM
zfmpTMANmvPMZWnmJXbMWbfWVMMdzZmsGd20hdXgPfxiIKeES1hl8eL5lSE/9dR+
WB5Hh1Q+WKG1tfgq73HnvMP2sUlG4tega+VWeponmHxGYhTnyfxuAxJ5gDgdSIKN
/Bf+KpYrtWKmpj29f5JZzVoqgrI3eQ==
-----END CERTIFICATE-----
Equifax Secure eBusiness CA 2
=============================
-----BEGIN CERTIFICATE-----
MIIDIDCCAomgAwIBAgIEN3DPtTANBgkqhkiG9w0BAQUFADBOMQswCQYDVQQGEwJV
UzEXMBUGA1UEChMORXF1aWZheCBTZWN1cmUxJjAkBgNVBAsTHUVxdWlmYXggU2Vj
dXJlIGVCdXNpbmVzcyBDQS0yMB4XDTk5MDYyMzEyMTQ0NVoXDTE5MDYyMzEyMTQ0
NVowTjELMAkGA1UEBhMCVVMxFzAVBgNVBAoTDkVxdWlmYXggU2VjdXJlMSYwJAYD
VQQLEx1FcXVpZmF4IFNlY3VyZSBlQnVzaW5lc3MgQ0EtMjCBnzANBgkqhkiG9w0B
AQEFAAOBjQAwgYkCgYEA5Dk5kx5SBhsoNviyoynF7Y6yEb3+6+e0dMKP/wXn2Z0G
vxLIPw7y1tEkshHe0XMJitSxLJgJDR5QRrKDpkWNYmi7hRsgcDKqQM2mll/EcTc/
BPO3QSQ5BxoeLmFYoBIL5aXfxavqN3HMHMg3OrmXUqesxWoklE6ce8/AatbfIb0C
AwEAAaOCAQkwggEFMHAGA1UdHwRpMGcwZaBjoGGkXzBdMQswCQYDVQQGEwJVUzEX
MBUGA1UEChMORXF1aWZheCBTZWN1cmUxJjAkBgNVBAsTHUVxdWlmYXggU2VjdXJl
IGVCdXNpbmVzcyBDQS0yMQ0wCwYDVQQDEwRDUkwxMBoGA1UdEAQTMBGBDzIwMTkw
NjIzMTIxNDQ1WjALBgNVHQ8EBAMCAQYwHwYDVR0jBBgwFoAUUJ4L6q9euSBIplBq
y/3YIHqngnYwHQYDVR0OBBYEFFCeC+qvXrkgSKZQasv92CB6p4J2MAwGA1UdEwQF
MAMBAf8wGgYJKoZIhvZ9B0EABA0wCxsFVjMuMGMDAgbAMA0GCSqGSIb3DQEBBQUA
A4GBAAyGgq3oThr1jokn4jVYPSm0B482UJW/bsGe68SQsoWou7dC4A8HOd/7npCy
0cE+U58DRLB+S/Rv5Hwf5+Kx5Lia78O9zt4LMjTZ3ijtM2vE1Nc9ElirfQkty3D1
E4qUoSek1nDFbZS1yX2doNLGCEnZZpum0/QL3MUmV+GRMOrN
-----END CERTIFICATE-----
Thawte Time Stamping CA
=======================
-----BEGIN CERTIFICATE-----
MIICoTCCAgqgAwIBAgIBADANBgkqhkiG9w0BAQQFADCBizELMAkGA1UEBhMCWkEx
FTATBgNVBAgTDFdlc3Rlcm4gQ2FwZTEUMBIGA1UEBxMLRHVyYmFudmlsbGUxDzAN
BgNVBAoTBlRoYXd0ZTEdMBsGA1UECxMUVGhhd3RlIENlcnRpZmljYXRpb24xHzAd
BgNVBAMTFlRoYXd0ZSBUaW1lc3RhbXBpbmcgQ0EwHhcNOTcwMTAxMDAwMDAwWhcN
MjAxMjMxMjM1OTU5WjCBizELMAkGA1UEBhMCWkExFTATBgNVBAgTDFdlc3Rlcm4g
Q2FwZTEUMBIGA1UEBxMLRHVyYmFudmlsbGUxDzANBgNVBAoTBlRoYXd0ZTEdMBsG
A1UECxMUVGhhd3RlIENlcnRpZmljYXRpb24xHzAdBgNVBAMTFlRoYXd0ZSBUaW1l
c3RhbXBpbmcgQ0EwgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBANYrWHhhRYZT
6jR7UZztsOYuGA7+4F+oJ9O0yeB8WU4WDnNUYMF/9p8u6TqFJBU820cEY8OexJQa
Wt9MevPZQx08EHp5JduQ/vBR5zDWQQD9nyjfeb6Uu522FOMjhdepQeBMpHmwKxqL
8vg7ij5FrHGSALSQQZj7X+36ty6K+Ig3AgMBAAGjEzARMA8GA1UdEwEB/wQFMAMB
Af8wDQYJKoZIhvcNAQEEBQADgYEAZ9viwuaHPUCDhjc1fR/OmsMMZiCouqoEiYbC
9RAIDb/LogWK0E02PvTX72nGXuSwlG9KuefeW4i2e9vjJ+V2w/A1wcu1J5szedyQ
pgCed/r8zSeUQhac0xxo7L9c3eWpexAKMnRUEzGLhQOEkbdYATAUOK8oyvyxUBkZ
CayJSdM=
-----END CERTIFICATE-----
thawte Primary Root CA
======================
-----BEGIN CERTIFICATE-----
MIIEIDCCAwigAwIBAgIQNE7VVyDV7exJ9C/ON9srbTANBgkqhkiG9w0BAQUFADCB
qTELMAkGA1UEBhMCVVMxFTATBgNVBAoTDHRoYXd0ZSwgSW5jLjEoMCYGA1UECxMf
Q2VydGlmaWNhdGlvbiBTZXJ2aWNlcyBEaXZpc2lvbjE4MDYGA1UECxMvKGMpIDIw
MDYgdGhhd3RlLCBJbmMuIC0gRm9yIGF1dGhvcml6ZWQgdXNlIG9ubHkxHzAdBgNV
BAMTFnRoYXd0ZSBQcmltYXJ5IFJvb3QgQ0EwHhcNMDYxMTE3MDAwMDAwWhcNMzYw
NzE2MjM1OTU5WjCBqTELMAkGA1UEBhMCVVMxFTATBgNVBAoTDHRoYXd0ZSwgSW5j
LjEoMCYGA1UECxMfQ2VydGlmaWNhdGlvbiBTZXJ2aWNlcyBEaXZpc2lvbjE4MDYG
A1UECxMvKGMpIDIwMDYgdGhhd3RlLCBJbmMuIC0gRm9yIGF1dGhvcml6ZWQgdXNl
IG9ubHkxHzAdBgNVBAMTFnRoYXd0ZSBQcmltYXJ5IFJvb3QgQ0EwggEiMA0GCSqG
SIb3DQEBAQUAA4IBDwAwggEKAoIBAQCsoPD7gFnUnMekz52hWXMJEEUMDSxuaPFs
W0hoSVk3/AszGcJ3f8wQLZU0HObrTQmnHNK4yZc2AreJ1CRfBsDMRJSUjQJib+ta
3RGNKJpchJAQeg29dGYvajig4tVUROsdB58Hum/u6f1OCyn1PoSgAfGcq/gcfomk
6KHYcWUNo1F77rzSImANuVud37r8UVsLr5iy6S7pBOhih94ryNdOwUxkHt3Ph1i6
Sk/KaAcdHJ1KxtUvkcx8cXIcxcBn6zL9yZJclNqFwJu/U30rCfSMnZEfl2pSy94J
NqR32HuHUETVPm4pafs5SSYeCaWAe0At6+gnhcn+Yf1+5nyXHdWdAgMBAAGjQjBA
MA8GA1UdEwEB/wQFMAMBAf8wDgYDVR0PAQH/BAQDAgEGMB0GA1UdDgQWBBR7W0XP
r87Lev0xkhpqtvNG61dIUDANBgkqhkiG9w0BAQUFAAOCAQEAeRHAS7ORtvzw6WfU
DW5FvlXok9LOAz/t2iWwHVfLHjp2oEzsUHboZHIMpKnxuIvW1oeEuzLlQRHAd9mz
YJ3rG9XRbkREqaYB7FViHXe4XI5ISXycO1cRrK1zN44veFyQaEfZYGDm/Ac9IiAX
xPcW6cTYcvnIc3zfFi8VqT79aie2oetaupgf1eNNZAqdE8hhuvU5HIe6uL17In/2
/qxAeeWsEG89jxt5dovEN7MhGITlNgDrYyCZuen+MwS7QcjBAvlEYyCegc5C09Y/
LHbTY5xZ3Y+m4Q6gLkH3LpVHz7z9M/P2C2F+fpErgUfCJzDupxBdN49cOSvkBPB7
jVaMaA==
-----END CERTIFICATE-----
VeriSign Class 3 Public Primary Certification Authority - G5
============================================================
-----BEGIN CERTIFICATE-----
MIIE0zCCA7ugAwIBAgIQGNrRniZ96LtKIVjNzGs7SjANBgkqhkiG9w0BAQUFADCB
yjELMAkGA1UEBhMCVVMxFzAVBgNVBAoTDlZlcmlTaWduLCBJbmMuMR8wHQYDVQQL
ExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMTowOAYDVQQLEzEoYykgMjAwNiBWZXJp
U2lnbiwgSW5jLiAtIEZvciBhdXRob3JpemVkIHVzZSBvbmx5MUUwQwYDVQQDEzxW
ZXJpU2lnbiBDbGFzcyAzIFB1YmxpYyBQcmltYXJ5IENlcnRpZmljYXRpb24gQXV0
aG9yaXR5IC0gRzUwHhcNMDYxMTA4MDAwMDAwWhcNMzYwNzE2MjM1OTU5WjCByjEL
MAkGA1UEBhMCVVMxFzAVBgNVBAoTDlZlcmlTaWduLCBJbmMuMR8wHQYDVQQLExZW
ZXJpU2lnbiBUcnVzdCBOZXR3b3JrMTowOAYDVQQLEzEoYykgMjAwNiBWZXJpU2ln
biwgSW5jLiAtIEZvciBhdXRob3JpemVkIHVzZSBvbmx5MUUwQwYDVQQDEzxWZXJp
U2lnbiBDbGFzcyAzIFB1YmxpYyBQcmltYXJ5IENlcnRpZmljYXRpb24gQXV0aG9y
aXR5IC0gRzUwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCvJAgIKXo1
nmAMqudLO07cfLw8RRy7K+D+KQL5VwijZIUVJ/XxrcgxiV0i6CqqpkKzj/i5Vbex
t0uz/o9+B1fs70PbZmIVYc9gDaTY3vjgw2IIPVQT60nKWVSFJuUrjxuf6/WhkcIz
SdhDY2pSS9KP6HBRTdGJaXvHcPaz3BJ023tdS1bTlr8Vd6Gw9KIl8q8ckmcY5fQG
BO+QueQA5N06tRn/Arr0PO7gi+s3i+z016zy9vA9r911kTMZHRxAy3QkGSGT2RT+
rCpSx4/VBEnkjWNHiDxpg8v+R70rfk/Fla4OndTRQ8Bnc+MUCH7lP59zuDMKz10/
NIeWiu5T6CUVAgMBAAGjgbIwga8wDwYDVR0TAQH/BAUwAwEB/zAOBgNVHQ8BAf8E
BAMCAQYwbQYIKwYBBQUHAQwEYTBfoV2gWzBZMFcwVRYJaW1hZ2UvZ2lmMCEwHzAH
BgUrDgMCGgQUj+XTGoasjY5rw8+AatRIGCx7GS4wJRYjaHR0cDovL2xvZ28udmVy
aXNpZ24uY29tL3ZzbG9nby5naWYwHQYDVR0OBBYEFH/TZafC3ey78DAJ80M5+gKv
MzEzMA0GCSqGSIb3DQEBBQUAA4IBAQCTJEowX2LP2BqYLz3q3JktvXf2pXkiOOzE
p6B4Eq1iDkVwZMXnl2YtmAl+X6/WzChl8gGqCBpH3vn5fJJaCGkgDdk+bW48DW7Y
5gaRQBi5+MHt39tBquCWIMnNZBU4gcmU7qKEKQsTb47bDN0lAtukixlE0kF6BWlK
WE9gyn6CagsCqiUXObXbf+eEZSqVir2G3l6BFoMtEMze/aiCKm0oHw0LxOXnGiYZ
4fQRbxC1lfznQgUy286dUV4otp6F01vvpX1FQHKOtw5rDgb7MzVIcbidJ4vEZV8N
hnacRHr2lVz2XTIIM6RUthg/aFzyQkqFOFSDX9HoLPKsEdao7WNq
-----END CERTIFICATE-----
Entrust.net Secure Server Certification Authority
=================================================
-----BEGIN CERTIFICATE-----
MIIE2DCCBEGgAwIBAgIEN0rSQzANBgkqhkiG9w0BAQUFADCBwzELMAkGA1UEBhMC
VVMxFDASBgNVBAoTC0VudHJ1c3QubmV0MTswOQYDVQQLEzJ3d3cuZW50cnVzdC5u
ZXQvQ1BTIGluY29ycC4gYnkgcmVmLiAobGltaXRzIGxpYWIuKTElMCMGA1UECxMc
KGMpIDE5OTkgRW50cnVzdC5uZXQgTGltaXRlZDE6MDgGA1UEAxMxRW50cnVzdC5u
ZXQgU2VjdXJlIFNlcnZlciBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTAeFw05OTA1
MjUxNjA5NDBaFw0xOTA1MjUxNjM5NDBaMIHDMQswCQYDVQQGEwJVUzEUMBIGA1UE
ChMLRW50cnVzdC5uZXQxOzA5BgNVBAsTMnd3dy5lbnRydXN0Lm5ldC9DUFMgaW5j
b3JwLiBieSByZWYuIChsaW1pdHMgbGlhYi4pMSUwIwYDVQQLExwoYykgMTk5OSBF
bnRydXN0Lm5ldCBMaW1pdGVkMTowOAYDVQQDEzFFbnRydXN0Lm5ldCBTZWN1cmUg
U2VydmVyIENlcnRpZmljYXRpb24gQXV0aG9yaXR5MIGdMA0GCSqGSIb3DQEBAQUA
A4GLADCBhwKBgQDNKIM0VBuJ8w+vN5Ex/68xYMmo6LIQaO2f55M28Qpku0f1BBc/
I0dNxScZgSYMVHINiC3ZH5oSn7yzcdOAGT9HZnuMNSjSuQrfJNqc1lB5gXpa0zf3
wkrYKZImZNHkmGw6AIr1NJtl+O3jEP/9uElY3KDegjlrgbEWGWG5VLbmQwIBA6OC
AdcwggHTMBEGCWCGSAGG+EIBAQQEAwIABzCCARkGA1UdHwSCARAwggEMMIHeoIHb
oIHYpIHVMIHSMQswCQYDVQQGEwJVUzEUMBIGA1UEChMLRW50cnVzdC5uZXQxOzA5
BgNVBAsTMnd3dy5lbnRydXN0Lm5ldC9DUFMgaW5jb3JwLiBieSByZWYuIChsaW1p
dHMgbGlhYi4pMSUwIwYDVQQLExwoYykgMTk5OSBFbnRydXN0Lm5ldCBMaW1pdGVk
MTowOAYDVQQDEzFFbnRydXN0Lm5ldCBTZWN1cmUgU2VydmVyIENlcnRpZmljYXRp
b24gQXV0aG9yaXR5MQ0wCwYDVQQDEwRDUkwxMCmgJ6AlhiNodHRwOi8vd3d3LmVu
dHJ1c3QubmV0L0NSTC9uZXQxLmNybDArBgNVHRAEJDAigA8xOTk5MDUyNTE2MDk0
MFqBDzIwMTkwNTI1MTYwOTQwWjALBgNVHQ8EBAMCAQYwHwYDVR0jBBgwFoAU8Bdi
E1U9s/8KAGv7UISX8+1i0BowHQYDVR0OBBYEFPAXYhNVPbP/CgBr+1CEl/PtYtAa
MAwGA1UdEwQFMAMBAf8wGQYJKoZIhvZ9B0EABAwwChsEVjQuMAMCBJAwDQYJKoZI
hvcNAQEFBQADgYEAkNwwAvpkdMKnCqV8IY00F6j7Rw7/JXyNEwr75Ji174z4xRAN
95K+8cPV1ZVqBLssziY2ZcgxxufuP+NXdYR6Ee9GTxj005i7qIcyunL2POI9n9cd
2cNgQ4xYDiKWL2KjLB+6rQXvqzJ4h6BUcxm1XAX5Uj5tLUUL9wqT6u0G+bI=
-----END CERTIFICATE-----
Go Daddy Certification Authority Root Certificate Bundle
========================================================
-----BEGIN CERTIFICATE-----
MIIE3jCCA8agAwIBAgICAwEwDQYJKoZIhvcNAQEFBQAwYzELMAkGA1UEBhMCVVMx
ITAfBgNVBAoTGFRoZSBHbyBEYWRkeSBHcm91cCwgSW5jLjExMC8GA1UECxMoR28g
RGFkZHkgQ2xhc3MgMiBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTAeFw0wNjExMTYw
MTU0MzdaFw0yNjExMTYwMTU0MzdaMIHKMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
QXJpem9uYTETMBEGA1UEBxMKU2NvdHRzZGFsZTEaMBgGA1UEChMRR29EYWRkeS5j
b20sIEluYy4xMzAxBgNVBAsTKmh0dHA6Ly9jZXJ0aWZpY2F0ZXMuZ29kYWRkeS5j
b20vcmVwb3NpdG9yeTEwMC4GA1UEAxMnR28gRGFkZHkgU2VjdXJlIENlcnRpZmlj
YXRpb24gQXV0aG9yaXR5MREwDwYDVQQFEwgwNzk2OTI4NzCCASIwDQYJKoZIhvcN
AQEBBQADggEPADCCAQoCggEBAMQt1RWMnCZM7DI161+4WQFapmGBWTtwY6vj3D3H
KrjJM9N55DrtPDAjhI6zMBS2sofDPZVUBJ7fmd0LJR4h3mUpfjWoqVTr9vcyOdQm
VZWt7/v+WIbXnvQAjYwqDL1CBM6nPwT27oDyqu9SoWlm2r4arV3aLGbqGmu75RpR
SgAvSMeYddi5Kcju+GZtCpyz8/x4fKL4o/K1w/O5epHBp+YlLpyo7RJlbmr2EkRT
cDCVw5wrWCs9CHRK8r5RsL+H0EwnWGu1NcWdrxcx+AuP7q2BNgWJCJjPOq8lh8BJ
6qf9Z/dFjpfMFDniNoW1fho3/Rb2cRGadDAW/hOUoz+EDU8CAwEAAaOCATIwggEu
MB0GA1UdDgQWBBT9rGEyk2xF1uLuhV+auud2mWjM5zAfBgNVHSMEGDAWgBTSxLDS
kdRMEXGzYcs9of7dqGrU4zASBgNVHRMBAf8ECDAGAQH/AgEAMDMGCCsGAQUFBwEB
BCcwJTAjBggrBgEFBQcwAYYXaHR0cDovL29jc3AuZ29kYWRkeS5jb20wRgYDVR0f
BD8wPTA7oDmgN4Y1aHR0cDovL2NlcnRpZmljYXRlcy5nb2RhZGR5LmNvbS9yZXBv
c2l0b3J5L2dkcm9vdC5jcmwwSwYDVR0gBEQwQjBABgRVHSAAMDgwNgYIKwYBBQUH
AgEWKmh0dHA6Ly9jZXJ0aWZpY2F0ZXMuZ29kYWRkeS5jb20vcmVwb3NpdG9yeTAO
BgNVHQ8BAf8EBAMCAQYwDQYJKoZIhvcNAQEFBQADggEBANKGwOy9+aG2Z+5mC6IG
OgRQjhVyrEp0lVPLN8tESe8HkGsz2ZbwlFalEzAFPIUyIXvJxwqoJKSQ3kbTJSMU
A2fCENZvD117esyfxVgqwcSeIaha86ykRvOe5GPLL5CkKSkB2XIsKd83ASe8T+5o
0yGPwLPk9Qnt0hCqU7S+8MxZC9Y7lhyVJEnfzuz9p0iRFEUOOjZv2kWzRaJBydTX
RE4+uXR21aITVSzGh6O1mawGhId/dQb8vxRMDsxuxN89txJx9OjxUUAiKEngHUuH
qDTMBqLdElrRhjZkAzVvb3du6/KFUJheqwNTrZEjYx8WnM25sgVjOuH0aBsXBTWV
U+4=
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIE+zCCBGSgAwIBAgICAQ0wDQYJKoZIhvcNAQEFBQAwgbsxJDAiBgNVBAcTG1Zh
bGlDZXJ0IFZhbGlkYXRpb24gTmV0d29yazEXMBUGA1UEChMOVmFsaUNlcnQsIElu
Yy4xNTAzBgNVBAsTLFZhbGlDZXJ0IENsYXNzIDIgUG9saWN5IFZhbGlkYXRpb24g
QXV0aG9yaXR5MSEwHwYDVQQDExhodHRwOi8vd3d3LnZhbGljZXJ0LmNvbS8xIDAe
BgkqhkiG9w0BCQEWEWluZm9AdmFsaWNlcnQuY29tMB4XDTA0MDYyOTE3MDYyMFoX
DTI0MDYyOTE3MDYyMFowYzELMAkGA1UEBhMCVVMxITAfBgNVBAoTGFRoZSBHbyBE
YWRkeSBHcm91cCwgSW5jLjExMC8GA1UECxMoR28gRGFkZHkgQ2xhc3MgMiBDZXJ0
aWZpY2F0aW9uIEF1dGhvcml0eTCCASAwDQYJKoZIhvcNAQEBBQADggENADCCAQgC
ggEBAN6d1+pXGEmhW+vXX0iG6r7d/+TvZxz0ZWizV3GgXne77ZtJ6XCAPVYYYwhv
2vLM0D9/AlQiVBDYsoHUwHU9S3/Hd8M+eKsaA7Ugay9qK7HFiH7Eux6wwdhFJ2+q
N1j3hybX2C32qRe3H3I2TqYXP2WYktsqbl2i/ojgC95/5Y0V4evLOtXiEqITLdiO
r18SPaAIBQi2XKVlOARFmR6jYGB0xUGlcmIbYsUfb18aQr4CUWWoriMYavx4A6lN
f4DD+qta/KFApMoZFv6yyO9ecw3ud72a9nmYvLEHZ6IVDd2gWMZEewo+YihfukEH
U1jPEX44dMX4/7VpkI+EdOqXG68CAQOjggHhMIIB3TAdBgNVHQ4EFgQU0sSw0pHU
TBFxs2HLPaH+3ahq1OMwgdIGA1UdIwSByjCBx6GBwaSBvjCBuzEkMCIGA1UEBxMb
VmFsaUNlcnQgVmFsaWRhdGlvbiBOZXR3b3JrMRcwFQYDVQQKEw5WYWxpQ2VydCwg
SW5jLjE1MDMGA1UECxMsVmFsaUNlcnQgQ2xhc3MgMiBQb2xpY3kgVmFsaWRhdGlv
biBBdXRob3JpdHkxITAfBgNVBAMTGGh0dHA6Ly93d3cudmFsaWNlcnQuY29tLzEg
MB4GCSqGSIb3DQEJARYRaW5mb0B2YWxpY2VydC5jb22CAQEwDwYDVR0TAQH/BAUw
AwEB/zAzBggrBgEFBQcBAQQnMCUwIwYIKwYBBQUHMAGGF2h0dHA6Ly9vY3NwLmdv
ZGFkZHkuY29tMEQGA1UdHwQ9MDswOaA3oDWGM2h0dHA6Ly9jZXJ0aWZpY2F0ZXMu
Z29kYWRkeS5jb20vcmVwb3NpdG9yeS9yb290LmNybDBLBgNVHSAERDBCMEAGBFUd
IAAwODA2BggrBgEFBQcCARYqaHR0cDovL2NlcnRpZmljYXRlcy5nb2RhZGR5LmNv
bS9yZXBvc2l0b3J5MA4GA1UdDwEB/wQEAwIBBjANBgkqhkiG9w0BAQUFAAOBgQC1
QPmnHfbq/qQaQlpE9xXUhUaJwL6e4+PrxeNYiY+Sn1eocSxI0YGyeR+sBjUZsE4O
WBsUs5iB0QQeyAfJg594RAoYC5jcdnplDQ1tgMQLARzLrUc+cb53S8wGd9D0Vmsf
SxOaFIqII6hR8INMqzW/Rn453HWkrugp++85j09VZw==
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIC5zCCAlACAQEwDQYJKoZIhvcNAQEFBQAwgbsxJDAiBgNVBAcTG1ZhbGlDZXJ0
IFZhbGlkYXRpb24gTmV0d29yazEXMBUGA1UEChMOVmFsaUNlcnQsIEluYy4xNTAz
BgNVBAsTLFZhbGlDZXJ0IENsYXNzIDIgUG9saWN5IFZhbGlkYXRpb24gQXV0aG9y
aXR5MSEwHwYDVQQDExhodHRwOi8vd3d3LnZhbGljZXJ0LmNvbS8xIDAeBgkqhkiG
9w0BCQEWEWluZm9AdmFsaWNlcnQuY29tMB4XDTk5MDYyNjAwMTk1NFoXDTE5MDYy
NjAwMTk1NFowgbsxJDAiBgNVBAcTG1ZhbGlDZXJ0IFZhbGlkYXRpb24gTmV0d29y
azEXMBUGA1UEChMOVmFsaUNlcnQsIEluYy4xNTAzBgNVBAsTLFZhbGlDZXJ0IENs
YXNzIDIgUG9saWN5IFZhbGlkYXRpb24gQXV0aG9yaXR5MSEwHwYDVQQDExhodHRw
Oi8vd3d3LnZhbGljZXJ0LmNvbS8xIDAeBgkqhkiG9w0BCQEWEWluZm9AdmFsaWNl
cnQuY29tMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDOOnHK5avIWZJV16vY
dA757tn2VUdZZUcOBVXc65g2PFxTXdMwzzjsvUGJ7SVCCSRrCl6zfN1SLUzm1NZ9
WlmpZdRJEy0kTRxQb7XBhVQ7/nHk01xC+YDgkRoKWzk2Z/M/VXwbP7RfZHM047QS
v4dk+NoS/zcnwbNDu+97bi5p9wIDAQABMA0GCSqGSIb3DQEBBQUAA4GBADt/UG9v
UJSZSWI4OB9L+KXIPqeCgfYrx+jFzug6EILLGACOTb2oWH+heQC1u+mNr0HZDzTu
IYEZoDJJKPTEjlbVUjP9UNV+mWwD5MlM/Mtsq2azSiGM5bUMMj4QssxsodyamEwC
W/POuZ6lcg5Ktz885hZo+L7tdEy8W9ViH0Pd
-----END CERTIFICATE-----
GeoTrust Global CA
==================
-----BEGIN CERTIFICATE-----
MIIDfTCCAuagAwIBAgIDErvmMA0GCSqGSIb3DQEBBQUAME4xCzAJBgNVBAYTAlVT
MRAwDgYDVQQKEwdFcXVpZmF4MS0wKwYDVQQLEyRFcXVpZmF4IFNlY3VyZSBDZXJ0
aWZpY2F0ZSBBdXRob3JpdHkwHhcNMDIwNTIxMDQwMDAwWhcNMTgwODIxMDQwMDAw
WjBCMQswCQYDVQQGEwJVUzEWMBQGA1UEChMNR2VvVHJ1c3QgSW5jLjEbMBkGA1UE
AxMSR2VvVHJ1c3QgR2xvYmFsIENBMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB
CgKCAQEA2swYYzD99BcjGlZ+W988bDjkcbd4kdS8odhM+KhDtgPpTSEHCIjaWC9m
OSm9BXiLnTjoBbdqfnGk5sRgprDvgOSJKA+eJdbtg/OtppHHmMlCGDUUna2YRpIu
T8rxh0PBFpVXLVDviS2Aelet8u5fa9IAjbkU+BQVNdnARqN7csiRv8lVK83Qlz6c
JmTM386DGXHKTubU1XupGc1V3sjs0l44U+VcT4wt/lAjNvxm5suOpDkZALeVAjmR
Cw7+OC7RHQWa9k0+bw8HHa8sHo9gOeL6NlMTOdReJivbPagUvTLrGAMoUgRx5asz
PeE4uwc2hGKceeoWMPRfwCvocWvk+QIDAQABo4HwMIHtMB8GA1UdIwQYMBaAFEjm
aPkr0rKV10fYIyAQTzOYkJ/UMB0GA1UdDgQWBBTAephojYn7qwVkDBF9qn1luMrM
TjAPBgNVHRMBAf8EBTADAQH/MA4GA1UdDwEB/wQEAwIBBjA6BgNVHR8EMzAxMC+g
LaArhilodHRwOi8vY3JsLmdlb3RydXN0LmNvbS9jcmxzL3NlY3VyZWNhLmNybDBO
BgNVHSAERzBFMEMGBFUdIAAwOzA5BggrBgEFBQcCARYtaHR0cHM6Ly93d3cuZ2Vv
dHJ1c3QuY29tL3Jlc291cmNlcy9yZXBvc2l0b3J5MA0GCSqGSIb3DQEBBQUAA4GB
AHbhEm5OSxYShjAGsoEIz/AIx8dxfmbuwu3UOx//8PDITtZDOLC5MH0Y0FWDomrL
NhGc6Ehmo21/uBPUR/6LWlxz/K7ZGzIZOKuXNBSqltLroxwUCEm2u+WR74M26x1W
b8ravHNjkOR/ez4iyz0H7V84dJzjA1BOoa+Y7mHyhD8S
-----END CERTIFICATE-----

View File

@@ -16,7 +16,7 @@ import urlparse
# Convert an IRI to a URI following the rules in RFC 3987
#
#
# The characters we need to enocde and escape are defined in the spec:
#
# iprivate = %xE000-F8FF / %xF0000-FFFFD / %x100000-10FFFD
@@ -28,28 +28,28 @@ import urlparse
# / %xD0000-DFFFD / %xE1000-EFFFD
escape_range = [
(0xA0, 0xD7FF ),
(0xE000, 0xF8FF ),
(0xF900, 0xFDCF ),
(0xFDF0, 0xFFEF),
(0x10000, 0x1FFFD ),
(0x20000, 0x2FFFD ),
(0x30000, 0x3FFFD),
(0x40000, 0x4FFFD ),
(0x50000, 0x5FFFD ),
(0x60000, 0x6FFFD),
(0x70000, 0x7FFFD ),
(0x80000, 0x8FFFD ),
(0x90000, 0x9FFFD),
(0xA0000, 0xAFFFD ),
(0xB0000, 0xBFFFD ),
(0xC0000, 0xCFFFD),
(0xD0000, 0xDFFFD ),
(0xE1000, 0xEFFFD),
(0xF0000, 0xFFFFD ),
(0x100000, 0x10FFFD)
(0xA0, 0xD7FF),
(0xE000, 0xF8FF),
(0xF900, 0xFDCF),
(0xFDF0, 0xFFEF),
(0x10000, 0x1FFFD),
(0x20000, 0x2FFFD),
(0x30000, 0x3FFFD),
(0x40000, 0x4FFFD),
(0x50000, 0x5FFFD),
(0x60000, 0x6FFFD),
(0x70000, 0x7FFFD),
(0x80000, 0x8FFFD),
(0x90000, 0x9FFFD),
(0xA0000, 0xAFFFD),
(0xB0000, 0xBFFFD),
(0xC0000, 0xCFFFD),
(0xD0000, 0xDFFFD),
(0xE1000, 0xEFFFD),
(0xF0000, 0xFFFFD),
(0x100000, 0x10FFFD),
]
def encode(c):
retval = c
i = ord(c)
@@ -63,19 +63,19 @@ def encode(c):
def iri2uri(uri):
"""Convert an IRI to a URI. Note that IRIs must be
"""Convert an IRI to a URI. Note that IRIs must be
passed in a unicode strings. That is, do not utf-8 encode
the IRI before passing it into the function."""
the IRI before passing it into the function."""
if isinstance(uri ,unicode):
(scheme, authority, path, query, fragment) = urlparse.urlsplit(uri)
authority = authority.encode('idna')
# For each character in 'ucschar' or 'iprivate'
# 1. encode as utf-8
# 2. then %-encode each octet of that utf-8
# 2. then %-encode each octet of that utf-8
uri = urlparse.urlunsplit((scheme, authority, path, query, fragment))
uri = "".join([encode(c) for c in uri])
return uri
if __name__ == "__main__":
import unittest
@@ -83,7 +83,7 @@ if __name__ == "__main__":
def test_uris(self):
"""Test that URIs are invariant under the transformation."""
invariant = [
invariant = [
u"ftp://ftp.is.co.za/rfc/rfc1808.txt",
u"http://www.ietf.org/rfc/rfc2396.txt",
u"ldap://[2001:db8::7]/c=GB?objectClass?one",
@@ -94,7 +94,7 @@ if __name__ == "__main__":
u"urn:oasis:names:specification:docbook:dtd:xml:4.1.2" ]
for uri in invariant:
self.assertEqual(uri, iri2uri(uri))
def test_iri(self):
""" Test that the right type of escaping is done for each part of the URI."""
self.assertEqual("http://xn--o3h.com/%E2%98%84", iri2uri(u"http://\N{COMET}.com/\N{COMET}"))
@@ -107,4 +107,4 @@ if __name__ == "__main__":
unittest.main()

438
libs/httplib2/socks.py Normal file
View File

@@ -0,0 +1,438 @@
"""SocksiPy - Python SOCKS module.
Version 1.00
Copyright 2006 Dan-Haim. All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of Dan Haim nor the names of his contributors may be used
to endorse or promote products derived from this software without specific
prior written permission.
THIS SOFTWARE IS PROVIDED BY DAN HAIM "AS IS" AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
EVENT SHALL DAN HAIM OR HIS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA
OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMANGE.
This module provides a standard socket-like interface for Python
for tunneling connections through SOCKS proxies.
"""
"""
Minor modifications made by Christopher Gilbert (http://motomastyle.com/)
for use in PyLoris (http://pyloris.sourceforge.net/)
Minor modifications made by Mario Vilas (http://breakingcode.wordpress.com/)
mainly to merge bug fixes found in Sourceforge
"""
import base64
import socket
import struct
import sys
if getattr(socket, 'socket', None) is None:
raise ImportError('socket.socket missing, proxy support unusable')
PROXY_TYPE_SOCKS4 = 1
PROXY_TYPE_SOCKS5 = 2
PROXY_TYPE_HTTP = 3
PROXY_TYPE_HTTP_NO_TUNNEL = 4
_defaultproxy = None
_orgsocket = socket.socket
class ProxyError(Exception): pass
class GeneralProxyError(ProxyError): pass
class Socks5AuthError(ProxyError): pass
class Socks5Error(ProxyError): pass
class Socks4Error(ProxyError): pass
class HTTPError(ProxyError): pass
_generalerrors = ("success",
"invalid data",
"not connected",
"not available",
"bad proxy type",
"bad input")
_socks5errors = ("succeeded",
"general SOCKS server failure",
"connection not allowed by ruleset",
"Network unreachable",
"Host unreachable",
"Connection refused",
"TTL expired",
"Command not supported",
"Address type not supported",
"Unknown error")
_socks5autherrors = ("succeeded",
"authentication is required",
"all offered authentication methods were rejected",
"unknown username or invalid password",
"unknown error")
_socks4errors = ("request granted",
"request rejected or failed",
"request rejected because SOCKS server cannot connect to identd on the client",
"request rejected because the client program and identd report different user-ids",
"unknown error")
def setdefaultproxy(proxytype=None, addr=None, port=None, rdns=True, username=None, password=None):
"""setdefaultproxy(proxytype, addr[, port[, rdns[, username[, password]]]])
Sets a default proxy which all further socksocket objects will use,
unless explicitly changed.
"""
global _defaultproxy
_defaultproxy = (proxytype, addr, port, rdns, username, password)
def wrapmodule(module):
"""wrapmodule(module)
Attempts to replace a module's socket library with a SOCKS socket. Must set
a default proxy using setdefaultproxy(...) first.
This will only work on modules that import socket directly into the namespace;
most of the Python Standard Library falls into this category.
"""
if _defaultproxy != None:
module.socket.socket = socksocket
else:
raise GeneralProxyError((4, "no proxy specified"))
class socksocket(socket.socket):
"""socksocket([family[, type[, proto]]]) -> socket object
Open a SOCKS enabled socket. The parameters are the same as
those of the standard socket init. In order for SOCKS to work,
you must specify family=AF_INET, type=SOCK_STREAM and proto=0.
"""
def __init__(self, family=socket.AF_INET, type=socket.SOCK_STREAM, proto=0, _sock=None):
_orgsocket.__init__(self, family, type, proto, _sock)
if _defaultproxy != None:
self.__proxy = _defaultproxy
else:
self.__proxy = (None, None, None, None, None, None)
self.__proxysockname = None
self.__proxypeername = None
self.__httptunnel = True
def __recvall(self, count):
"""__recvall(count) -> data
Receive EXACTLY the number of bytes requested from the socket.
Blocks until the required number of bytes have been received.
"""
data = self.recv(count)
while len(data) < count:
d = self.recv(count-len(data))
if not d: raise GeneralProxyError((0, "connection closed unexpectedly"))
data = data + d
return data
def sendall(self, content, *args):
""" override socket.socket.sendall method to rewrite the header
for non-tunneling proxies if needed
"""
if not self.__httptunnel:
content = self.__rewriteproxy(content)
return super(socksocket, self).sendall(content, *args)
def __rewriteproxy(self, header):
""" rewrite HTTP request headers to support non-tunneling proxies
(i.e. those which do not support the CONNECT method).
This only works for HTTP (not HTTPS) since HTTPS requires tunneling.
"""
host, endpt = None, None
hdrs = header.split("\r\n")
for hdr in hdrs:
if hdr.lower().startswith("host:"):
host = hdr
elif hdr.lower().startswith("get") or hdr.lower().startswith("post"):
endpt = hdr
if host and endpt:
hdrs.remove(host)
hdrs.remove(endpt)
host = host.split(" ")[1]
endpt = endpt.split(" ")
if (self.__proxy[4] != None and self.__proxy[5] != None):
hdrs.insert(0, self.__getauthheader())
hdrs.insert(0, "Host: %s" % host)
hdrs.insert(0, "%s http://%s%s %s" % (endpt[0], host, endpt[1], endpt[2]))
return "\r\n".join(hdrs)
def __getauthheader(self):
auth = self.__proxy[4] + ":" + self.__proxy[5]
return "Proxy-Authorization: Basic " + base64.b64encode(auth)
def setproxy(self, proxytype=None, addr=None, port=None, rdns=True, username=None, password=None):
"""setproxy(proxytype, addr[, port[, rdns[, username[, password]]]])
Sets the proxy to be used.
proxytype - The type of the proxy to be used. Three types
are supported: PROXY_TYPE_SOCKS4 (including socks4a),
PROXY_TYPE_SOCKS5 and PROXY_TYPE_HTTP
addr - The address of the server (IP or DNS).
port - The port of the server. Defaults to 1080 for SOCKS
servers and 8080 for HTTP proxy servers.
rdns - Should DNS queries be preformed on the remote side
(rather than the local side). The default is True.
Note: This has no effect with SOCKS4 servers.
username - Username to authenticate with to the server.
The default is no authentication.
password - Password to authenticate with to the server.
Only relevant when username is also provided.
"""
self.__proxy = (proxytype, addr, port, rdns, username, password)
def __negotiatesocks5(self, destaddr, destport):
"""__negotiatesocks5(self,destaddr,destport)
Negotiates a connection through a SOCKS5 server.
"""
# First we'll send the authentication packages we support.
if (self.__proxy[4]!=None) and (self.__proxy[5]!=None):
# The username/password details were supplied to the
# setproxy method so we support the USERNAME/PASSWORD
# authentication (in addition to the standard none).
self.sendall(struct.pack('BBBB', 0x05, 0x02, 0x00, 0x02))
else:
# No username/password were entered, therefore we
# only support connections with no authentication.
self.sendall(struct.pack('BBB', 0x05, 0x01, 0x00))
# We'll receive the server's response to determine which
# method was selected
chosenauth = self.__recvall(2)
if chosenauth[0:1] != chr(0x05).encode():
self.close()
raise GeneralProxyError((1, _generalerrors[1]))
# Check the chosen authentication method
if chosenauth[1:2] == chr(0x00).encode():
# No authentication is required
pass
elif chosenauth[1:2] == chr(0x02).encode():
# Okay, we need to perform a basic username/password
# authentication.
self.sendall(chr(0x01).encode() + chr(len(self.__proxy[4])) + self.__proxy[4] + chr(len(self.__proxy[5])) + self.__proxy[5])
authstat = self.__recvall(2)
if authstat[0:1] != chr(0x01).encode():
# Bad response
self.close()
raise GeneralProxyError((1, _generalerrors[1]))
if authstat[1:2] != chr(0x00).encode():
# Authentication failed
self.close()
raise Socks5AuthError((3, _socks5autherrors[3]))
# Authentication succeeded
else:
# Reaching here is always bad
self.close()
if chosenauth[1] == chr(0xFF).encode():
raise Socks5AuthError((2, _socks5autherrors[2]))
else:
raise GeneralProxyError((1, _generalerrors[1]))
# Now we can request the actual connection
req = struct.pack('BBB', 0x05, 0x01, 0x00)
# If the given destination address is an IP address, we'll
# use the IPv4 address request even if remote resolving was specified.
try:
ipaddr = socket.inet_aton(destaddr)
req = req + chr(0x01).encode() + ipaddr
except socket.error:
# Well it's not an IP number, so it's probably a DNS name.
if self.__proxy[3]:
# Resolve remotely
ipaddr = None
req = req + chr(0x03).encode() + chr(len(destaddr)).encode() + destaddr
else:
# Resolve locally
ipaddr = socket.inet_aton(socket.gethostbyname(destaddr))
req = req + chr(0x01).encode() + ipaddr
req = req + struct.pack(">H", destport)
self.sendall(req)
# Get the response
resp = self.__recvall(4)
if resp[0:1] != chr(0x05).encode():
self.close()
raise GeneralProxyError((1, _generalerrors[1]))
elif resp[1:2] != chr(0x00).encode():
# Connection failed
self.close()
if ord(resp[1:2])<=8:
raise Socks5Error((ord(resp[1:2]), _socks5errors[ord(resp[1:2])]))
else:
raise Socks5Error((9, _socks5errors[9]))
# Get the bound address/port
elif resp[3:4] == chr(0x01).encode():
boundaddr = self.__recvall(4)
elif resp[3:4] == chr(0x03).encode():
resp = resp + self.recv(1)
boundaddr = self.__recvall(ord(resp[4:5]))
else:
self.close()
raise GeneralProxyError((1,_generalerrors[1]))
boundport = struct.unpack(">H", self.__recvall(2))[0]
self.__proxysockname = (boundaddr, boundport)
if ipaddr != None:
self.__proxypeername = (socket.inet_ntoa(ipaddr), destport)
else:
self.__proxypeername = (destaddr, destport)
def getproxysockname(self):
"""getsockname() -> address info
Returns the bound IP address and port number at the proxy.
"""
return self.__proxysockname
def getproxypeername(self):
"""getproxypeername() -> address info
Returns the IP and port number of the proxy.
"""
return _orgsocket.getpeername(self)
def getpeername(self):
"""getpeername() -> address info
Returns the IP address and port number of the destination
machine (note: getproxypeername returns the proxy)
"""
return self.__proxypeername
def __negotiatesocks4(self,destaddr,destport):
"""__negotiatesocks4(self,destaddr,destport)
Negotiates a connection through a SOCKS4 server.
"""
# Check if the destination address provided is an IP address
rmtrslv = False
try:
ipaddr = socket.inet_aton(destaddr)
except socket.error:
# It's a DNS name. Check where it should be resolved.
if self.__proxy[3]:
ipaddr = struct.pack("BBBB", 0x00, 0x00, 0x00, 0x01)
rmtrslv = True
else:
ipaddr = socket.inet_aton(socket.gethostbyname(destaddr))
# Construct the request packet
req = struct.pack(">BBH", 0x04, 0x01, destport) + ipaddr
# The username parameter is considered userid for SOCKS4
if self.__proxy[4] != None:
req = req + self.__proxy[4]
req = req + chr(0x00).encode()
# DNS name if remote resolving is required
# NOTE: This is actually an extension to the SOCKS4 protocol
# called SOCKS4A and may not be supported in all cases.
if rmtrslv:
req = req + destaddr + chr(0x00).encode()
self.sendall(req)
# Get the response from the server
resp = self.__recvall(8)
if resp[0:1] != chr(0x00).encode():
# Bad data
self.close()
raise GeneralProxyError((1,_generalerrors[1]))
if resp[1:2] != chr(0x5A).encode():
# Server returned an error
self.close()
if ord(resp[1:2]) in (91, 92, 93):
self.close()
raise Socks4Error((ord(resp[1:2]), _socks4errors[ord(resp[1:2]) - 90]))
else:
raise Socks4Error((94, _socks4errors[4]))
# Get the bound address/port
self.__proxysockname = (socket.inet_ntoa(resp[4:]), struct.unpack(">H", resp[2:4])[0])
if rmtrslv != None:
self.__proxypeername = (socket.inet_ntoa(ipaddr), destport)
else:
self.__proxypeername = (destaddr, destport)
def __negotiatehttp(self, destaddr, destport):
"""__negotiatehttp(self,destaddr,destport)
Negotiates a connection through an HTTP server.
"""
# If we need to resolve locally, we do this now
if not self.__proxy[3]:
addr = socket.gethostbyname(destaddr)
else:
addr = destaddr
headers = ["CONNECT ", addr, ":", str(destport), " HTTP/1.1\r\n"]
headers += ["Host: ", destaddr, "\r\n"]
if (self.__proxy[4] != None and self.__proxy[5] != None):
headers += [self.__getauthheader(), "\r\n"]
headers.append("\r\n")
self.sendall("".join(headers).encode())
# We read the response until we get the string "\r\n\r\n"
resp = self.recv(1)
while resp.find("\r\n\r\n".encode()) == -1:
resp = resp + self.recv(1)
# We just need the first line to check if the connection
# was successful
statusline = resp.splitlines()[0].split(" ".encode(), 2)
if statusline[0] not in ("HTTP/1.0".encode(), "HTTP/1.1".encode()):
self.close()
raise GeneralProxyError((1, _generalerrors[1]))
try:
statuscode = int(statusline[1])
except ValueError:
self.close()
raise GeneralProxyError((1, _generalerrors[1]))
if statuscode != 200:
self.close()
raise HTTPError((statuscode, statusline[2]))
self.__proxysockname = ("0.0.0.0", 0)
self.__proxypeername = (addr, destport)
def connect(self, destpair):
"""connect(self, despair)
Connects to the specified destination through a proxy.
destpar - A tuple of the IP/DNS address and the port number.
(identical to socket's connect).
To select the proxy server use setproxy().
"""
# Do a minimal input check first
if (not type(destpair) in (list,tuple)) or (len(destpair) < 2) or (not isinstance(destpair[0], basestring)) or (type(destpair[1]) != int):
raise GeneralProxyError((5, _generalerrors[5]))
if self.__proxy[0] == PROXY_TYPE_SOCKS5:
if self.__proxy[2] != None:
portnum = self.__proxy[2]
else:
portnum = 1080
_orgsocket.connect(self, (self.__proxy[1], portnum))
self.__negotiatesocks5(destpair[0], destpair[1])
elif self.__proxy[0] == PROXY_TYPE_SOCKS4:
if self.__proxy[2] != None:
portnum = self.__proxy[2]
else:
portnum = 1080
_orgsocket.connect(self,(self.__proxy[1], portnum))
self.__negotiatesocks4(destpair[0], destpair[1])
elif self.__proxy[0] == PROXY_TYPE_HTTP:
if self.__proxy[2] != None:
portnum = self.__proxy[2]
else:
portnum = 8080
_orgsocket.connect(self,(self.__proxy[1], portnum))
self.__negotiatehttp(destpair[0], destpair[1])
elif self.__proxy[0] == PROXY_TYPE_HTTP_NO_TUNNEL:
if self.__proxy[2] != None:
portnum = self.__proxy[2]
else:
portnum = 8080
_orgsocket.connect(self,(self.__proxy[1],portnum))
if destpair[1] == 443:
self.__negotiatehttp(destpair[0],destpair[1])
else:
self.__httptunnel = False
elif self.__proxy[0] == None:
_orgsocket.connect(self, (destpair[0], destpair[1]))
else:
raise GeneralProxyError((4, _generalerrors[4]))

View File

@@ -1 +1,8 @@
majorVersionId = '1'
import sys
# http://www.python.org/dev/peps/pep-0396/
__version__ = '0.1.7'
if sys.version_info[:2] < (2, 4):
raise RuntimeError('PyASN1 requires Python 2.4 or later')

View File

@@ -0,0 +1 @@
# This file is necessary to make this directory a package.

View File

@@ -0,0 +1 @@
# This file is necessary to make this directory a package.

View File

@@ -1,21 +1,24 @@
# BER decoder
from pyasn1.type import tag, base, univ, char, useful, tagmap
from pyasn1.codec.ber import eoo
from pyasn1.compat.octets import oct2int, octs2ints
from pyasn1 import error
from pyasn1.compat.octets import oct2int, octs2ints, isOctetsType
from pyasn1 import debug, error
class AbstractDecoder:
protoComponent = None
def valueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
raise error.PyAsn1Error('Decoder not implemented for %s' % tagSet)
length, state, decodeFun, substrateFun):
raise error.PyAsn1Error('Decoder not implemented for %s' % (tagSet,))
def indefLenValueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
raise error.PyAsn1Error('Indefinite length mode decoder not implemented for %s' % tagSet)
length, state, decodeFun, substrateFun):
raise error.PyAsn1Error('Indefinite length mode decoder not implemented for %s' % (tagSet,))
class AbstractSimpleDecoder(AbstractDecoder):
tagFormats = (tag.tagFormatSimple,)
def _createComponent(self, asn1Spec, tagSet, value=None):
if tagSet[0][1] not in self.tagFormats:
raise error.PyAsn1Error('Invalid tag format %r for %r' % (tagSet[0], self.protoComponent,))
if asn1Spec is None:
return self.protoComponent.clone(value, tagSet)
elif value is None:
@@ -24,7 +27,10 @@ class AbstractSimpleDecoder(AbstractDecoder):
return asn1Spec.clone(value)
class AbstractConstructedDecoder(AbstractDecoder):
tagFormats = (tag.tagFormatConstructed,)
def _createComponent(self, asn1Spec, tagSet, value=None):
if tagSet[0][1] not in self.tagFormats:
raise error.PyAsn1Error('Invalid tag format %r for %r' % (tagSet[0], self.protoComponent,))
if asn1Spec is None:
return self.protoComponent.clone(tagSet)
else:
@@ -32,19 +38,34 @@ class AbstractConstructedDecoder(AbstractDecoder):
class EndOfOctetsDecoder(AbstractSimpleDecoder):
def valueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
return eoo.endOfOctets, substrate[:length]
length, state, decodeFun, substrateFun):
return eoo.endOfOctets, substrate[length:]
class ExplicitTagDecoder(AbstractSimpleDecoder):
protoComponent = univ.Any('')
tagFormats = (tag.tagFormatConstructed,)
def valueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
return decodeFun(substrate[:length], asn1Spec, tagSet, length)
length, state, decodeFun, substrateFun):
if substrateFun:
return substrateFun(
self._createComponent(asn1Spec, tagSet, ''),
substrate, length
)
head, tail = substrate[:length], substrate[length:]
value, _ = decodeFun(head, asn1Spec, tagSet, length)
return value, tail
def indefLenValueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
length, state, decodeFun, substrateFun):
if substrateFun:
return substrateFun(
self._createComponent(asn1Spec, tagSet, ''),
substrate, length
)
value, substrate = decodeFun(substrate, asn1Spec, tagSet, length)
terminator, substrate = decodeFun(substrate)
if terminator == eoo.endOfOctets:
if eoo.endOfOctets.isSameTypeWith(terminator) and \
terminator == eoo.endOfOctets:
return value, substrate
else:
raise error.PyAsn1Error('Missing end-of-octets terminator')
@@ -71,79 +92,71 @@ class IntegerDecoder(AbstractSimpleDecoder):
'\xfb': -5
}
def _valueFilter(self, value):
try:
return int(value)
except OverflowError:
return value
def valueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet, length,
state, decodeFun):
substrate = substrate[:length]
if not substrate:
raise error.PyAsn1Error('Empty substrate')
if substrate in self.precomputedValues:
value = self.precomputedValues[substrate]
state, decodeFun, substrateFun):
head, tail = substrate[:length], substrate[length:]
if not head:
return self._createComponent(asn1Spec, tagSet, 0), tail
if head in self.precomputedValues:
value = self.precomputedValues[head]
else:
firstOctet = oct2int(substrate[0])
firstOctet = oct2int(head[0])
if firstOctet & 0x80:
value = -1
else:
value = 0
for octet in substrate:
for octet in head:
value = value << 8 | oct2int(octet)
value = self._valueFilter(value)
return self._createComponent(asn1Spec, tagSet, value), substrate
return self._createComponent(asn1Spec, tagSet, value), tail
class BooleanDecoder(IntegerDecoder):
protoComponent = univ.Boolean(0)
def _valueFilter(self, value):
if value:
return 1
else:
return 0
def _createComponent(self, asn1Spec, tagSet, value=None):
return IntegerDecoder._createComponent(self, asn1Spec, tagSet, value and 1 or 0)
class BitStringDecoder(AbstractSimpleDecoder):
protoComponent = univ.BitString(())
tagFormats = (tag.tagFormatSimple, tag.tagFormatConstructed)
def valueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet, length,
state, decodeFun):
substrate = substrate[:length]
state, decodeFun, substrateFun):
head, tail = substrate[:length], substrate[length:]
if tagSet[0][1] == tag.tagFormatSimple: # XXX what tag to check?
if not substrate:
raise error.PyAsn1Error('Missing initial octet')
trailingBits = oct2int(substrate[0])
if not head:
raise error.PyAsn1Error('Empty substrate')
trailingBits = oct2int(head[0])
if trailingBits > 7:
raise error.PyAsn1Error(
'Trailing bits overflow %s' % trailingBits
)
substrate = substrate[1:]
lsb = p = 0; l = len(substrate)-1; b = ()
head = head[1:]
lsb = p = 0; l = len(head)-1; b = ()
while p <= l:
if p == l:
lsb = trailingBits
j = 7
o = oct2int(substrate[p])
o = oct2int(head[p])
while j >= lsb:
b = b + ((o>>j)&0x01,)
j = j - 1
p = p + 1
return self._createComponent(asn1Spec, tagSet, b), ''
return self._createComponent(asn1Spec, tagSet, b), tail
r = self._createComponent(asn1Spec, tagSet, ())
if not decodeFun:
return r, substrate
while substrate:
component, substrate = decodeFun(substrate)
if substrateFun:
return substrateFun(r, substrate, length)
while head:
component, head = decodeFun(head)
r = r + component
return r, substrate
return r, tail
def indefLenValueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
length, state, decodeFun, substrateFun):
r = self._createComponent(asn1Spec, tagSet, '')
if not decodeFun:
return r, substrate
if substrateFun:
return substrateFun(r, substrate, length)
while substrate:
component, substrate = decodeFun(substrate)
if component == eoo.endOfOctets:
if eoo.endOfOctets.isSameTypeWith(component) and \
component == eoo.endOfOctets:
break
r = r + component
else:
@@ -154,27 +167,29 @@ class BitStringDecoder(AbstractSimpleDecoder):
class OctetStringDecoder(AbstractSimpleDecoder):
protoComponent = univ.OctetString('')
tagFormats = (tag.tagFormatSimple, tag.tagFormatConstructed)
def valueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet, length,
state, decodeFun):
substrate = substrate[:length]
state, decodeFun, substrateFun):
head, tail = substrate[:length], substrate[length:]
if tagSet[0][1] == tag.tagFormatSimple: # XXX what tag to check?
return self._createComponent(asn1Spec, tagSet, substrate), ''
return self._createComponent(asn1Spec, tagSet, head), tail
r = self._createComponent(asn1Spec, tagSet, '')
if not decodeFun:
return r, substrate
while substrate:
component, substrate = decodeFun(substrate)
if substrateFun:
return substrateFun(r, substrate, length)
while head:
component, head = decodeFun(head)
r = r + component
return r, substrate
return r, tail
def indefLenValueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
length, state, decodeFun, substrateFun):
r = self._createComponent(asn1Spec, tagSet, '')
if not decodeFun:
return r, substrate
if substrateFun:
return substrateFun(r, substrate, length)
while substrate:
component, substrate = decodeFun(substrate)
if component == eoo.endOfOctets:
if eoo.endOfOctets.isSameTypeWith(component) and \
component == eoo.endOfOctets:
break
r = r + component
else:
@@ -186,93 +201,89 @@ class OctetStringDecoder(AbstractSimpleDecoder):
class NullDecoder(AbstractSimpleDecoder):
protoComponent = univ.Null('')
def valueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
substrate = substrate[:length]
length, state, decodeFun, substrateFun):
head, tail = substrate[:length], substrate[length:]
r = self._createComponent(asn1Spec, tagSet)
if substrate:
raise error.PyAsn1Error('Unexpected substrate for Null')
return r, substrate
if head:
raise error.PyAsn1Error('Unexpected %d-octet substrate for Null' % length)
return r, tail
class ObjectIdentifierDecoder(AbstractSimpleDecoder):
protoComponent = univ.ObjectIdentifier(())
def valueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet, length,
state, decodeFun):
substrate = substrate[:length]
if not substrate:
state, decodeFun, substrateFun):
head, tail = substrate[:length], substrate[length:]
if not head:
raise error.PyAsn1Error('Empty substrate')
oid = (); index = 0
# Get the first subid
subId = oct2int(substrate[index])
oid = oid + divmod(subId, 40)
index = index + 1
substrateLen = len(substrate)
# Get the first subid
subId = oct2int(head[0])
oid = divmod(subId, 40)
index = 1
substrateLen = len(head)
while index < substrateLen:
subId = oct2int(substrate[index])
if subId < 128:
oid = oid + (subId,)
index = index + 1
else:
subId = oct2int(head[index])
index = index + 1
if subId == 128:
# ASN.1 spec forbids leading zeros (0x80) in sub-ID OID
# encoding, tolerating it opens a vulnerability.
# See http://www.cosic.esat.kuleuven.be/publications/article-1432.pdf page 7
raise error.PyAsn1Error('Invalid leading 0x80 in sub-OID')
elif subId > 128:
# Construct subid from a number of octets
nextSubId = subId
subId = 0
while nextSubId >= 128 and index < substrateLen:
while nextSubId >= 128:
subId = (subId << 7) + (nextSubId & 0x7F)
if index >= substrateLen:
raise error.SubstrateUnderrunError(
'Short substrate for sub-OID past %s' % (oid,)
)
nextSubId = oct2int(head[index])
index = index + 1
nextSubId = oct2int(substrate[index])
if index == substrateLen:
raise error.SubstrateUnderrunError(
'Short substrate for OID %s' % oid
)
subId = (subId << 7) + nextSubId
oid = oid + (subId,)
index = index + 1
return self._createComponent(asn1Spec, tagSet, oid), substrate[index:]
oid = oid + (subId,)
return self._createComponent(asn1Spec, tagSet, oid), tail
class RealDecoder(AbstractSimpleDecoder):
protoComponent = univ.Real()
def valueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
substrate = substrate[:length]
if not length:
raise error.SubstrateUnderrunError('Short substrate for Real')
fo = oct2int(substrate[0]); substrate = substrate[1:]
if fo & 0x40: # infinite value
value = fo & 0x01 and '-inf' or 'inf'
elif fo & 0x80: # binary enoding
if fo & 0x11 == 0:
n = 1
elif fo & 0x01:
n = 2
elif fo & 0x02:
n = 3
else:
n = oct2int(substrate[0])
eo, substrate = substrate[:n], substrate[n:]
if not eo or not substrate:
length, state, decodeFun, substrateFun):
head, tail = substrate[:length], substrate[length:]
if not head:
return self._createComponent(asn1Spec, tagSet, 0.0), tail
fo = oct2int(head[0]); head = head[1:]
if fo & 0x80: # binary enoding
n = (fo & 0x03) + 1
if n == 4:
n = oct2int(head[0])
eo, head = head[:n], head[n:]
if not eo or not head:
raise error.PyAsn1Error('Real exponent screwed')
e = 0
e = oct2int(eo[0]) & 0x80 and -1 or 0
while eo: # exponent
e <<= 8
e |= oct2int(eo[0])
eo = eo[1:]
p = 0
while substrate: # value
while head: # value
p <<= 8
p |= oct2int(substrate[0])
substrate = substrate[1:]
p |= oct2int(head[0])
head = head[1:]
if fo & 0x40: # sign bit
p = -p
value = (p, 2, e)
elif fo & 0x40: # infinite value
value = fo & 0x01 and '-inf' or 'inf'
elif fo & 0xc0 == 0: # character encoding
try:
if fo & 0x3 == 0x1: # NR1
value = (int(substrate), 10, 0)
value = (int(head), 10, 0)
elif fo & 0x3 == 0x2: # NR2
value = float(substrate)
value = float(head)
elif fo & 0x3 == 0x3: # NR3
value = float(substrate)
value = float(head)
else:
raise error.SubstrateUnderrunError(
'Unknown NR (tag %s)' % fo
@@ -281,13 +292,11 @@ class RealDecoder(AbstractSimpleDecoder):
raise error.SubstrateUnderrunError(
'Bad character Real syntax'
)
elif fo & 0xc0 == 0x40: # special real value
pass
else:
raise error.SubstrateUnderrunError(
'Unknown encoding (tag %s)' % fo
)
return self._createComponent(asn1Spec, tagSet, value), substrate
return self._createComponent(asn1Spec, tagSet, value), tail
class SequenceDecoder(AbstractConstructedDecoder):
protoComponent = univ.Sequence()
@@ -301,17 +310,15 @@ class SequenceDecoder(AbstractConstructedDecoder):
return r.getComponentPositionNearType(t, idx)
def valueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
substrate = substrate[:length]
length, state, decodeFun, substrateFun):
head, tail = substrate[:length], substrate[length:]
r = self._createComponent(asn1Spec, tagSet)
idx = 0
if not decodeFun:
return r, substrate
while substrate:
if substrateFun:
return substrateFun(r, substrate, length)
while head:
asn1Spec = self._getComponentTagMap(r, idx)
component, substrate = decodeFun(
substrate, asn1Spec
)
component, head = decodeFun(head, asn1Spec)
idx = self._getComponentPositionByType(
r, component.getEffectiveTagSet(), idx
)
@@ -319,18 +326,19 @@ class SequenceDecoder(AbstractConstructedDecoder):
idx = idx + 1
r.setDefaultComponents()
r.verifySizeSpec()
return r, substrate
return r, tail
def indefLenValueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
length, state, decodeFun, substrateFun):
r = self._createComponent(asn1Spec, tagSet)
if substrateFun:
return substrateFun(r, substrate, length)
idx = 0
while substrate:
asn1Spec = self._getComponentTagMap(r, idx)
if not decodeFun:
return r, substrate
component, substrate = decodeFun(substrate, asn1Spec)
if component == eoo.endOfOctets:
if eoo.endOfOctets.isSameTypeWith(component) and \
component == eoo.endOfOctets:
break
idx = self._getComponentPositionByType(
r, component.getEffectiveTagSet(), idx
@@ -348,32 +356,31 @@ class SequenceDecoder(AbstractConstructedDecoder):
class SequenceOfDecoder(AbstractConstructedDecoder):
protoComponent = univ.SequenceOf()
def valueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
substrate = substrate[:length]
length, state, decodeFun, substrateFun):
head, tail = substrate[:length], substrate[length:]
r = self._createComponent(asn1Spec, tagSet)
if substrateFun:
return substrateFun(r, substrate, length)
asn1Spec = r.getComponentType()
idx = 0
if not decodeFun:
return r, substrate
while substrate:
component, substrate = decodeFun(
substrate, asn1Spec
)
while head:
component, head = decodeFun(head, asn1Spec)
r.setComponentByPosition(idx, component, asn1Spec is None)
idx = idx + 1
r.verifySizeSpec()
return r, substrate
return r, tail
def indefLenValueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
length, state, decodeFun, substrateFun):
r = self._createComponent(asn1Spec, tagSet)
if substrateFun:
return substrateFun(r, substrate, length)
asn1Spec = r.getComponentType()
idx = 0
if not decodeFun:
return r, substrate
while substrate:
component, substrate = decodeFun(substrate, asn1Spec)
if component == eoo.endOfOctets:
if eoo.endOfOctets.isSameTypeWith(component) and \
component == eoo.endOfOctets:
break
r.setComponentByPosition(idx, component, asn1Spec is None)
idx = idx + 1
@@ -401,43 +408,68 @@ class SetOfDecoder(SequenceOfDecoder):
class ChoiceDecoder(AbstractConstructedDecoder):
protoComponent = univ.Choice()
tagFormats = (tag.tagFormatSimple, tag.tagFormatConstructed)
def valueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
substrate = substrate[:length]
length, state, decodeFun, substrateFun):
head, tail = substrate[:length], substrate[length:]
r = self._createComponent(asn1Spec, tagSet)
if not decodeFun:
return r, substrate
if substrateFun:
return substrateFun(r, substrate, length)
if r.getTagSet() == tagSet: # explicitly tagged Choice
component, substrate = decodeFun(
substrate, r.getComponentTagMap()
component, head = decodeFun(
head, r.getComponentTagMap()
)
else:
component, substrate = decodeFun(
substrate, r.getComponentTagMap(), tagSet, length, state
component, head = decodeFun(
head, r.getComponentTagMap(), tagSet, length, state
)
if isinstance(component, univ.Choice):
effectiveTagSet = component.getEffectiveTagSet()
else:
effectiveTagSet = component.getTagSet()
r.setComponentByType(effectiveTagSet, component, 0, asn1Spec is None)
return r, tail
def indefLenValueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun, substrateFun):
r = self._createComponent(asn1Spec, tagSet)
if substrateFun:
return substrateFun(r, substrate, length)
if r.getTagSet() == tagSet: # explicitly tagged Choice
component, substrate = decodeFun(substrate, r.getComponentTagMap())
eooMarker, substrate = decodeFun(substrate) # eat up EOO marker
if not eoo.endOfOctets.isSameTypeWith(eooMarker) or \
eooMarker != eoo.endOfOctets:
raise error.PyAsn1Error('No EOO seen before substrate ends')
else:
component, substrate= decodeFun(
substrate, r.getComponentTagMap(), tagSet, length, state
)
if isinstance(component, univ.Choice):
effectiveTagSet = component.getEffectiveTagSet()
else:
effectiveTagSet = component.getTagSet()
r.setComponentByType(effectiveTagSet, component, 0, asn1Spec is None)
return r, substrate
indefLenValueDecoder = valueDecoder
class AnyDecoder(AbstractSimpleDecoder):
protoComponent = univ.Any()
tagFormats = (tag.tagFormatSimple, tag.tagFormatConstructed)
def valueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
length, state, decodeFun, substrateFun):
if asn1Spec is None or \
asn1Spec is not None and tagSet != asn1Spec.getTagSet():
# untagged Any container, recover inner header substrate
length = length + len(fullSubstrate) - len(substrate)
substrate = fullSubstrate
substrate = substrate[:length]
return self._createComponent(asn1Spec, tagSet, value=substrate), ''
if substrateFun:
return substrateFun(self._createComponent(asn1Spec, tagSet),
substrate, length)
head, tail = substrate[:length], substrate[length:]
return self._createComponent(asn1Spec, tagSet, value=head), tail
def indefLenValueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet,
length, state, decodeFun):
length, state, decodeFun, substrateFun):
if asn1Spec is not None and tagSet == asn1Spec.getTagSet():
# tagged Any type -- consume header substrate
header = ''
@@ -450,11 +482,12 @@ class AnyDecoder(AbstractSimpleDecoder):
# Any components do not inherit initial tag
asn1Spec = self.protoComponent
if not decodeFun:
return r, substrate
if substrateFun:
return substrateFun(r, substrate, length)
while substrate:
component, substrate = decodeFun(substrate, asn1Spec)
if component == eoo.endOfOctets:
if eoo.endOfOctets.isSameTypeWith(component) and \
component == eoo.endOfOctets:
break
r = r + component
else:
@@ -550,7 +583,10 @@ class Decoder:
self.__tagSetCache = {}
def __call__(self, substrate, asn1Spec=None, tagSet=None,
length=None, state=stDecodeTag, recursiveFlag=1):
length=None, state=stDecodeTag, recursiveFlag=1,
substrateFun=None):
if debug.logger & debug.flagDecoder:
debug.logger('decoder called at scope %s with state %d, working with up to %d octets of substrate: %s' % (debug.scope, state, len(substrate), debug.hexdump(substrate)))
fullSubstrate = substrate
while state != stStop:
if state == stDecodeTag:
@@ -559,6 +595,9 @@ class Decoder:
raise error.SubstrateUnderrunError(
'Short octet stream on tag decoding'
)
if not isOctetsType(substrate) and \
not isinstance(substrate, univ.OctetString):
raise error.PyAsn1Error('Bad octet stream type')
firstOctet = substrate[0]
substrate = substrate[1:]
@@ -598,6 +637,7 @@ class Decoder:
else:
tagSet = lastTag + tagSet
state = stDecodeLength
debug.logger and debug.logger & debug.flagDecoder and debug.logger('tag decoded into %r, decoding length' % tagSet)
if state == stDecodeLength:
# Decode length
if not substrate:
@@ -625,12 +665,13 @@ class Decoder:
for char in lengthString:
length = (length << 8) | oct2int(char)
size = size + 1
state = stGetValueDecoder
substrate = substrate[size:]
if length != -1 and len(substrate) < length:
raise error.SubstrateUnderrunError(
'%d-octet short' % (length - len(substrate))
)
state = stGetValueDecoder
debug.logger and debug.logger & debug.flagDecoder and debug.logger('value length decoded into %d, payload substrate is: %s' % (length, debug.hexdump(length == -1 and substrate or substrate[:length])))
if state == stGetValueDecoder:
if asn1Spec is None:
state = stGetValueDecoderByTag
@@ -669,14 +710,27 @@ class Decoder:
state = stDecodeValue
else:
state = stTryAsExplicitTag
if debug.logger and debug.logger & debug.flagDecoder:
debug.logger('codec %s chosen by a built-in type, decoding %s' % (concreteDecoder and concreteDecoder.__class__.__name__ or "<none>", state == stDecodeValue and 'value' or 'as explicit tag'))
debug.scope.push(concreteDecoder is None and '?' or concreteDecoder.protoComponent.__class__.__name__)
if state == stGetValueDecoderByAsn1Spec:
if isinstance(asn1Spec, (dict, tagmap.TagMap)):
if tagSet in asn1Spec:
__chosenSpec = asn1Spec[tagSet]
else:
__chosenSpec = None
if debug.logger and debug.logger & debug.flagDecoder:
debug.logger('candidate ASN.1 spec is a map of:')
for t, v in asn1Spec.getPosMap().items():
debug.logger(' %r -> %s' % (t, v.__class__.__name__))
if asn1Spec.getNegMap():
debug.logger('but neither of: ')
for i in asn1Spec.getNegMap().items():
debug.logger(' %r -> %s' % (t, v.__class__.__name__))
debug.logger('new candidate ASN.1 spec is %s, chosen by %r' % (__chosenSpec is None and '<none>' or __chosenSpec.__class__.__name__, tagSet))
else:
__chosenSpec = asn1Spec
debug.logger and debug.logger & debug.flagDecoder and debug.logger('candidate ASN.1 spec is %s' % asn1Spec.__class__.__name__)
if __chosenSpec is not None and (
tagSet == __chosenSpec.getTagSet() or \
tagSet in __chosenSpec.getTagMap()
@@ -687,9 +741,11 @@ class Decoder:
__chosenSpec.typeId in self.__typeMap:
# ambiguous type
concreteDecoder = self.__typeMap[__chosenSpec.typeId]
debug.logger and debug.logger & debug.flagDecoder and debug.logger('value decoder chosen for an ambiguous type by type ID %s' % (__chosenSpec.typeId,))
elif baseTagSet in self.__tagMap:
# base type or tagged subtype
concreteDecoder = self.__tagMap[baseTagSet]
debug.logger and debug.logger & debug.flagDecoder and debug.logger('value decoder chosen by base %r' % (baseTagSet,))
else:
concreteDecoder = None
if concreteDecoder:
@@ -700,8 +756,13 @@ class Decoder:
elif tagSet == self.__endOfOctetsTagSet:
concreteDecoder = self.__tagMap[tagSet]
state = stDecodeValue
debug.logger and debug.logger & debug.flagDecoder and debug.logger('end-of-octets found')
else:
concreteDecoder = None
state = stTryAsExplicitTag
if debug.logger and debug.logger & debug.flagDecoder:
debug.logger('codec %s chosen by ASN.1 spec, decoding %s' % (state == stDecodeValue and concreteDecoder.__class__.__name__ or "<none>", state == stDecodeValue and 'value' or 'as explicit tag'))
debug.scope.push(__chosenSpec is None and '?' or __chosenSpec.__class__.__name__)
if state == stTryAsExplicitTag:
if tagSet and \
tagSet[0][1] == tag.tagFormatConstructed and \
@@ -710,34 +771,35 @@ class Decoder:
concreteDecoder = explicitTagDecoder
state = stDecodeValue
else:
concreteDecoder = None
state = self.defaultErrorState
debug.logger and debug.logger & debug.flagDecoder and debug.logger('codec %s chosen, decoding %s' % (concreteDecoder and concreteDecoder.__class__.__name__ or "<none>", state == stDecodeValue and 'value' or 'as failure'))
if state == stDumpRawValue:
concreteDecoder = self.defaultRawDecoder
debug.logger and debug.logger & debug.flagDecoder and debug.logger('codec %s chosen, decoding value' % concreteDecoder.__class__.__name__)
state = stDecodeValue
if state == stDecodeValue:
if recursiveFlag:
decodeFun = self
else:
decodeFun = None
if recursiveFlag == 0 and not substrateFun: # legacy
substrateFun = lambda a,b,c: (a,b[:c])
if length == -1: # indef length
value, substrate = concreteDecoder.indefLenValueDecoder(
fullSubstrate, substrate, asn1Spec, tagSet, length,
stGetValueDecoder, decodeFun
stGetValueDecoder, self, substrateFun
)
else:
value, _substrate = concreteDecoder.valueDecoder(
value, substrate = concreteDecoder.valueDecoder(
fullSubstrate, substrate, asn1Spec, tagSet, length,
stGetValueDecoder, decodeFun
stGetValueDecoder, self, substrateFun
)
if recursiveFlag:
substrate = substrate[length:]
else:
substrate = _substrate
state = stStop
debug.logger and debug.logger & debug.flagDecoder and debug.logger('codec %s yields type %s, value:\n%s\n...remaining substrate is: %s' % (concreteDecoder.__class__.__name__, value.__class__.__name__, value.prettyPrint(), substrate and debug.hexdump(substrate) or '<none>'))
if state == stErrorCondition:
raise error.PyAsn1Error(
'%r not in asn1Spec: %r' % (tagSet, asn1Spec)
)
if debug.logger and debug.logger & debug.flagDecoder:
debug.scope.pop()
debug.logger('decoder left scope %s, call completed' % debug.scope)
return value, substrate
decode = Decoder(tagMap, typeMap)

View File

@@ -1,8 +1,8 @@
# BER encoder
from pyasn1.type import base, tag, univ, char, useful
from pyasn1.codec.ber import eoo
from pyasn1.compat.octets import int2oct, ints2octs, null, str2octs
from pyasn1 import error
from pyasn1.compat.octets import int2oct, oct2int, ints2octs, null, str2octs
from pyasn1 import debug, error
class Error(Exception): pass
@@ -78,9 +78,24 @@ class ExplicitlyTaggedItemEncoder(AbstractItemEncoder):
explicitlyTaggedItemEncoder = ExplicitlyTaggedItemEncoder()
class BooleanEncoder(AbstractItemEncoder):
supportIndefLenMode = 0
_true = ints2octs((1,))
_false = ints2octs((0,))
def encodeValue(self, encodeFun, value, defMode, maxChunkSize):
return value and self._true or self._false, 0
class IntegerEncoder(AbstractItemEncoder):
supportIndefLenMode = 0
supportCompactZero = False
def encodeValue(self, encodeFun, value, defMode, maxChunkSize):
if value == 0: # shortcut for zero value
if self.supportCompactZero:
# this seems to be a correct way for encoding zeros
return null, 0
else:
# this seems to be a widespread way for encoding zeros
return ints2octs((0,)), 0
octets = []
value = int(value) # to save on ops on asn1 type
while 1:
@@ -149,18 +164,15 @@ class ObjectIdentifierEncoder(AbstractItemEncoder):
index = 5
else:
if len(oid) < 2:
raise error.PyAsn1Error('Short OID %s' % value)
raise error.PyAsn1Error('Short OID %s' % (value,))
# Build the first twos
index = 0
subid = oid[index] * 40
subid = subid + oid[index+1]
if subid < 0 or subid > 0xff:
if oid[0] > 6 or oid[1] > 39 or oid[0] == 6 and oid[1] > 15:
raise error.PyAsn1Error(
'Initial sub-ID overflow %s in OID %s' % (oid[index:], value)
'Initial sub-ID overflow %s in OID %s' % (oid[:2], value)
)
octets = (subid,)
index = index + 2
octets = (oid[0] * 40 + oid[1],)
index = 2
# Cycle through subids
for subid in oid[index:]:
@@ -184,6 +196,7 @@ class ObjectIdentifierEncoder(AbstractItemEncoder):
return ints2octs(octets), 0
class RealEncoder(AbstractItemEncoder):
supportIndefLenMode = 0
def encodeValue(self, encodeFun, value, defMode, maxChunkSize):
if value.isPlusInfinity():
return int2oct(0x40), 0
@@ -206,9 +219,11 @@ class RealEncoder(AbstractItemEncoder):
m >>= 1
e += 1
eo = null
while e:
while e not in (0, -1):
eo = int2oct(e&0xff) + eo
e >>= 8
if e == 0 and eo and oct2int(eo[0]) & 0x80:
eo = int2oct(0) + eo
n = len(eo)
if n > 0xff:
raise error.PyAsn1Error('Real exponent overflow')
@@ -268,7 +283,7 @@ class AnyEncoder(OctetStringEncoder):
tagMap = {
eoo.endOfOctets.tagSet: EndOfOctetsEncoder(),
univ.Boolean.tagSet: IntegerEncoder(),
univ.Boolean.tagSet: BooleanEncoder(),
univ.Integer.tagSet: IntegerEncoder(),
univ.BitString.tagSet: BitStringEncoder(),
univ.OctetString.tagSet: OctetStringEncoder(),
@@ -313,6 +328,7 @@ class Encoder:
self.__typeMap = typeMap
def __call__(self, value, defMode=1, maxChunkSize=0):
debug.logger & debug.flagEncoder and debug.logger('encoder called in %sdef mode, chunk size %s for type %s, value:\n%s' % (not defMode and 'in' or '', maxChunkSize, value.__class__.__name__, value.prettyPrint()))
tagSet = value.getTagSet()
if len(tagSet) > 1:
concreteEncoder = explicitlyTaggedItemEncoder
@@ -322,13 +338,16 @@ class Encoder:
elif tagSet in self.__tagMap:
concreteEncoder = self.__tagMap[tagSet]
else:
baseTagSet = value.baseTagSet
if baseTagSet in self.__tagMap:
concreteEncoder = self.__tagMap[baseTagSet]
tagSet = value.baseTagSet
if tagSet in self.__tagMap:
concreteEncoder = self.__tagMap[tagSet]
else:
raise Error('No encoder for %s' % value)
return concreteEncoder.encode(
raise Error('No encoder for %s' % (value,))
debug.logger & debug.flagEncoder and debug.logger('using value codec %s chosen by %r' % (concreteEncoder.__class__.__name__, tagSet))
substrate = concreteEncoder.encode(
self, value, defMode, maxChunkSize
)
debug.logger & debug.flagEncoder and debug.logger('built %s octets of substrate: %s\nencoder completed' % (len(substrate), debug.hexdump(substrate)))
return substrate
encode = Encoder(tagMap, typeMap)

View File

@@ -0,0 +1 @@
# This file is necessary to make this directory a package.

View File

@@ -7,22 +7,25 @@ from pyasn1 import error
class BooleanDecoder(decoder.AbstractSimpleDecoder):
protoComponent = univ.Boolean(0)
def valueDecoder(self, fullSubstrate, substrate, asn1Spec, tagSet, length,
state, decodeFun):
substrate = substrate[:length]
if not substrate:
state, decodeFun, substrateFun):
head, tail = substrate[:length], substrate[length:]
if not head:
raise error.PyAsn1Error('Empty substrate')
byte = oct2int(substrate[0])
byte = oct2int(head[0])
# CER/DER specifies encoding of TRUE as 0xFF and FALSE as 0x0, while
# BER allows any non-zero value as TRUE; cf. sections 8.2.2. and 11.1
# in http://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf
if byte == 0xff:
value = 1
elif byte == 0x00:
value = 0
else:
raise error.PyAsn1Error('Boolean CER violation: %s' % byte)
return self._createComponent(asn1Spec, tagSet, value), substrate[1:]
return self._createComponent(asn1Spec, tagSet, value), tail
tagMap = decoder.tagMap.copy()
tagMap.update({
univ.Boolean.tagSet: BooleanDecoder(),
univ.Boolean.tagSet: BooleanDecoder()
})
typeMap = decoder.typeMap

View File

@@ -0,0 +1 @@
# This file is necessary to make this directory a package.

View File

@@ -2,4 +2,8 @@
from pyasn1.type import univ
from pyasn1.codec.cer import decoder
decode = decoder.Decoder(decoder.tagMap, decoder.typeMap)
tagMap = decoder.tagMap
typeMap = decoder.typeMap
Decoder = decoder.Decoder
decode = Decoder(tagMap, typeMap)

View File

@@ -0,0 +1 @@
# This file is necessary to make this directory a package.

View File

@@ -8,6 +8,7 @@ if version_info[0] <= 2:
octs2ints = lambda s: [ oct2int(x) for x in s ]
str2octs = lambda x: x
octs2str = lambda x: x
isOctetsType = lambda s: isinstance(s, str)
else:
ints2octs = bytes
int2oct = lambda x: ints2octs((x,))
@@ -16,3 +17,4 @@ else:
octs2ints = lambda s: [ x for x in s ]
str2octs = lambda x: x.encode()
octs2str = lambda x: x.decode()
isOctetsType = lambda s: isinstance(s, bytes)

65
libs/pyasn1/debug.py Normal file
View File

@@ -0,0 +1,65 @@
import sys
from pyasn1.compat.octets import octs2ints
from pyasn1 import error
from pyasn1 import __version__
flagNone = 0x0000
flagEncoder = 0x0001
flagDecoder = 0x0002
flagAll = 0xffff
flagMap = {
'encoder': flagEncoder,
'decoder': flagDecoder,
'all': flagAll
}
class Debug:
defaultPrinter = sys.stderr.write
def __init__(self, *flags):
self._flags = flagNone
self._printer = self.defaultPrinter
self('running pyasn1 version %s' % __version__)
for f in flags:
if f not in flagMap:
raise error.PyAsn1Error('bad debug flag %s' % (f,))
self._flags = self._flags | flagMap[f]
self('debug category \'%s\' enabled' % f)
def __str__(self):
return 'logger %s, flags %x' % (self._printer, self._flags)
def __call__(self, msg):
self._printer('DBG: %s\n' % msg)
def __and__(self, flag):
return self._flags & flag
def __rand__(self, flag):
return flag & self._flags
logger = 0
def setLogger(l):
global logger
logger = l
def hexdump(octets):
return ' '.join(
[ '%s%.2X' % (n%16 == 0 and ('\n%.5d: ' % n) or '', x)
for n,x in zip(range(len(octets)), octs2ints(octets)) ]
)
class Scope:
def __init__(self):
self._list = []
def __str__(self): return '.'.join(self._list)
def push(self, token):
self._list.append(token)
def pop(self):
return self._list.pop()
scope = Scope()

View File

@@ -0,0 +1 @@
# This file is necessary to make this directory a package.

View File

@@ -120,7 +120,12 @@ class AbstractSimpleAsn1Item(Asn1ItemBase):
def prettyIn(self, value): return value
def prettyOut(self, value): return str(value)
def prettyPrint(self, scope=0): return self.prettyOut(self._value)
def prettyPrint(self, scope=0):
if self._value is noValue:
return '<no value>'
else:
return self.prettyOut(self._value)
# XXX Compatibility stub
def prettyPrinter(self, scope=0): return self.prettyPrint(scope)

View File

@@ -60,12 +60,12 @@ class NamedTypes:
tagMap = self.__namedTypes[idx].getType().getTagMap()
for t in tagMap.getPosMap():
if t in self.__tagToPosIdx:
raise error.PyAsn1Error('Duplicate type %s' % t)
raise error.PyAsn1Error('Duplicate type %s' % (t,))
self.__tagToPosIdx[t] = idx
try:
return self.__tagToPosIdx[tagSet]
except KeyError:
raise error.PyAsn1Error('Type %s not found' % tagSet)
raise error.PyAsn1Error('Type %s not found' % (tagSet,))
def getNameByPosition(self, idx):
try:
@@ -79,12 +79,12 @@ class NamedTypes:
idx = idx - 1
n = self.__namedTypes[idx].getName()
if n in self.__nameToPosIdx:
raise error.PyAsn1Error('Duplicate name %s' % n)
raise error.PyAsn1Error('Duplicate name %s' % (n,))
self.__nameToPosIdx[n] = idx
try:
return self.__nameToPosIdx[name]
except KeyError:
raise error.PyAsn1Error('Name %s not found' % name)
raise error.PyAsn1Error('Name %s not found' % (name,))
def __buildAmbigiousTagMap(self):
ambigiousTypes = ()

View File

@@ -15,10 +15,10 @@ class NamedValues:
name = namedValue
val = automaticVal
if name in self.nameToValIdx:
raise error.PyAsn1Error('Duplicate name %s' % name)
raise error.PyAsn1Error('Duplicate name %s' % (name,))
self.nameToValIdx[name] = val
if val in self.valToNameIdx:
raise error.PyAsn1Error('Duplicate value %s' % name)
raise error.PyAsn1Error('Duplicate value %s=%s' % (name, val))
self.valToNameIdx[val] = name
self.namedValues = self.namedValues + ((name, val),)
automaticVal = automaticVal + 1

View File

@@ -18,7 +18,7 @@ class Tag:
def __init__(self, tagClass, tagFormat, tagId):
if tagId < 0:
raise error.PyAsn1Error(
'Negative tag ID (%s) not allowed' % tagId
'Negative tag ID (%s) not allowed' % (tagId,)
)
self.__tag = (tagClass, tagFormat, tagId)
self.uniq = (tagClass, tagId)

View File

@@ -28,7 +28,7 @@ class TagMap:
def clone(self, parentType, tagMap, uniq=False):
if self.__defType is not None and tagMap.getDef() is not None:
raise error.PyAsn1Error('Duplicate default value at %s' % self)
raise error.PyAsn1Error('Duplicate default value at %s' % (self,))
if tagMap.getDef() is not None:
defType = tagMap.getDef()
else:
@@ -37,7 +37,7 @@ class TagMap:
posMap = self.__posMap.copy()
for k in tagMap.getPosMap():
if uniq and k in posMap:
raise error.PyAsn1Error('Duplicate positive key %s' % k)
raise error.PyAsn1Error('Duplicate positive key %s' % (k,))
posMap[k] = parentType
negMap = self.__negMap.copy()

Some files were not shown because too many files have changed in this diff Show More