<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Using Unicode with ElementTidy</title>
	<atom:link href="http://www.blueskyonmars.com/2005/01/31/using-unicode-with-elementtidy/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.blueskyonmars.com/2005/01/31/using-unicode-with-elementtidy/</link>
	<description>Kevin Dangoor on Software Development</description>
	<pubDate>Mon, 01 Dec 2008 21:39:15 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.3</generator>
		<item>
		<title>By: olpa, OSS developer &#187; Blog Archive &#187; shut up, you dummy 7-bit Python</title>
		<link>http://www.blueskyonmars.com/2005/01/31/using-unicode-with-elementtidy/#comment-117453</link>
		<dc:creator>olpa, OSS developer &#187; Blog Archive &#187; shut up, you dummy 7-bit Python</dc:creator>
		<pubDate>Fri, 23 Mar 2007 23:40:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.blueskyonmars.com/wordpress/?p=1240#comment-117453</guid>
		<description>[...] The Illusive setdefaultencoding * Using Unicode with ElementTidy * [Zopyrus] [...]</description>
		<content:encoded><![CDATA[<p>[...] The Illusive setdefaultencoding * Using Unicode with ElementTidy * [Zopyrus] [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Patrick Hall</title>
		<link>http://www.blueskyonmars.com/2005/01/31/using-unicode-with-elementtidy/#comment-606</link>
		<dc:creator>Patrick Hall</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.blueskyonmars.com/wordpress/?p=1240#comment-606</guid>
		<description>Hi Kevin, 

I've been banging my head against similar problems with HTML and also with processing syndicated feeds with feedparser. Some, but not all of them, seem to have been resolved by editing sitecustomize.py. 

I'm nowhere near your level of gurudom (I don't grok C much), but could you explain why you think editing sitecustomize.py is a bad idea? I understand that in some cases (like on a hosted web server), the file may not be accessible to users without root, but isn't it fair to say that utf-8 *should* be the default encoding, if at all possible?

Interesting post, cheers...</description>
		<content:encoded><![CDATA[<p>Hi Kevin, </p>
<p>I&#8217;ve been banging my head against similar problems with HTML and also with processing syndicated feeds with feedparser. Some, but not all of them, seem to have been resolved by editing sitecustomize.py. </p>
<p>I&#8217;m nowhere near your level of gurudom (I don&#8217;t grok C much), but could you explain why you think editing sitecustomize.py is a bad idea? I understand that in some cases (like on a hosted web server), the file may not be accessible to users without root, but isn&#8217;t it fair to say that utf-8 *should* be the default encoding, if at all possible?</p>
<p>Interesting post, cheers&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin</title>
		<link>http://www.blueskyonmars.com/2005/01/31/using-unicode-with-elementtidy/#comment-607</link>
		<dc:creator>Kevin</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.blueskyonmars.com/wordpress/?p=1240#comment-607</guid>
		<description>Whether or not sitecustomize is a good solution for you probably depends on your app. The nice thing about sitecustomize is that it can be anywhere in your pythonpath, so you generally don't need root access to use it.

If you're using sitecustomize on an app for which you completely control the environment, that should work just fine. But, if you're distributing the app, it's really a lot better if your application can run properly in whatever environment it gets dropped into.

By the way, with some help from the python-list, I solved my problem:

&lt;a href="http://www.blueskyonmars.com/archives/2005/02/03/dont_mix_unicode_and_encoded_strings.html"&gt;http://www.blueskyonmars.com/archives/2005/02/03/dont_mix_unicode_and_encoded_strings.html&lt;/a&gt;

Take a look at that posting, and you may be able to get around using a sitecustomize file as well.

Kevin</description>
		<content:encoded><![CDATA[<p>Whether or not sitecustomize is a good solution for you probably depends on your app. The nice thing about sitecustomize is that it can be anywhere in your pythonpath, so you generally don&#8217;t need root access to use it.</p>
<p>If you&#8217;re using sitecustomize on an app for which you completely control the environment, that should work just fine. But, if you&#8217;re distributing the app, it&#8217;s really a lot better if your application can run properly in whatever environment it gets dropped into.</p>
<p>By the way, with some help from the python-list, I solved my problem:</p>
<p><a href="http://www.blueskyonmars.com/archives/2005/02/03/dont_mix_unicode_and_encoded_strings.html">http://www.blueskyonmars.com/archives/2005/02/03/dont_mix_unicode_and_encoded_strings.html</a></p>
<p>Take a look at that posting, and you may be able to get around using a sitecustomize file as well.</p>
<p>Kevin</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Patrick Hall</title>
		<link>http://www.blueskyonmars.com/2005/01/31/using-unicode-with-elementtidy/#comment-608</link>
		<dc:creator>Patrick Hall</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.blueskyonmars.com/wordpress/?p=1240#comment-608</guid>
		<description>Hi again, thanks for the response. 

This is actually the part where I'm not sure what best practice is, because I'm under the impression that in fact you *can't* add sitecustomize.py anywhere in your path, because of some programming oddities. These are mentioned in Martin Doudoroff's article (which you recently blogged about):

"You want this application to handle multi-lingual text, so you're going to take advantage of Unicode. The first thing you will probably want to do is set up a sitecustomize.py file in the Lib directory of your python installation and designate a Unicode encoding (probably UTF-8) as the default encoding for Python.

import sys
sys.setdefaultencoding("utf-8")

Important: as of Python 2.2, as far as I can tell, you can only call the setdefaultencoding method from within sitecustomize.py. You cannot perform this step from within your application! I don't understand why Guido set it up this way, but I'm sure he had his reasons. "

I'm really glad you brought that article to my attention, because it's the first time I've seen someone just come out and say that setting utf-8 as default in sitecustomize.py is the way to go. The number of headaches that it seems to solve, imho, are far greater than the ones involved in editing the file. 

I see what you're saying about creating apps for distribution. That's something I never really do, though, the only "distribution" I do is by way of cgi apps or something like that. 

Thanks for the discussion!</description>
		<content:encoded><![CDATA[<p>Hi again, thanks for the response. </p>
<p>This is actually the part where I&#8217;m not sure what best practice is, because I&#8217;m under the impression that in fact you *can&#8217;t* add sitecustomize.py anywhere in your path, because of some programming oddities. These are mentioned in Martin Doudoroff&#8217;s article (which you recently blogged about):</p>
<p>&#8220;You want this application to handle multi-lingual text, so you&#8217;re going to take advantage of Unicode. The first thing you will probably want to do is set up a sitecustomize.py file in the Lib directory of your python installation and designate a Unicode encoding (probably UTF-8) as the default encoding for Python.</p>
<p>import sys<br />
sys.setdefaultencoding(&#8221;utf-8&#8243;)</p>
<p>Important: as of Python 2.2, as far as I can tell, you can only call the setdefaultencoding method from within sitecustomize.py. You cannot perform this step from within your application! I don&#8217;t understand why Guido set it up this way, but I&#8217;m sure he had his reasons. &#8221;</p>
<p>I&#8217;m really glad you brought that article to my attention, because it&#8217;s the first time I&#8217;ve seen someone just come out and say that setting utf-8 as default in sitecustomize.py is the way to go. The number of headaches that it seems to solve, imho, are far greater than the ones involved in editing the file. </p>
<p>I see what you&#8217;re saying about creating apps for distribution. That&#8217;s something I never really do, though, the only &#8220;distribution&#8221; I do is by way of cgi apps or something like that. </p>
<p>Thanks for the discussion!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin</title>
		<link>http://www.blueskyonmars.com/2005/01/31/using-unicode-with-elementtidy/#comment-609</link>
		<dc:creator>Kevin</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.blueskyonmars.com/wordpress/?p=1240#comment-609</guid>
		<description>sitecustomize.py can be anywhere on your PYTHONPATH. (I know, because I've tried it out of desparation and it does work.) The point made in that article is that you can't, at some random point in your program, just import sys and set the encoding. It has to happen at startup time, so it has to be on the path at startup time.

sitecustomize.py is definitely the easiest thing to do when you're in complete control of what the PYTHONPATH will be at startup.</description>
		<content:encoded><![CDATA[<p>sitecustomize.py can be anywhere on your PYTHONPATH. (I know, because I&#8217;ve tried it out of desparation and it does work.) The point made in that article is that you can&#8217;t, at some random point in your program, just import sys and set the encoding. It has to happen at startup time, so it has to be on the path at startup time.</p>
<p>sitecustomize.py is definitely the easiest thing to do when you&#8217;re in complete control of what the PYTHONPATH will be at startup.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fredrik</title>
		<link>http://www.blueskyonmars.com/2005/01/31/using-unicode-with-elementtidy/#comment-610</link>
		<dc:creator>Fredrik</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.blueskyonmars.com/wordpress/?p=1240#comment-610</guid>
		<description>sys.setdefaultencoding() was added for experimentation during Unicode development, and should not be used in production code.  All sorts of ugliness can happen if you mess around with the conversion rules (especially if you use a variable-width encoding).  It's not that hard to write encoding-aware code, really.

As for ElementTidy, an encoding attribute was added to the recent 1.0b1 release.
</description>
		<content:encoded><![CDATA[<p>sys.setdefaultencoding() was added for experimentation during Unicode development, and should not be used in production code.  All sorts of ugliness can happen if you mess around with the conversion rules (especially if you use a variable-width encoding).  It&#8217;s not that hard to write encoding-aware code, really.</p>
<p>As for ElementTidy, an encoding attribute was added to the recent 1.0b1 release.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.290 seconds -->
