<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
<title><![CDATA[向东博客 专注WEB应用 构架之美 --- 构架之美，在于尽态极妍 | 应用之美，在于药到病除]]></title> 
<link>http://www.jackxiang.com/index.php</link> 
<description><![CDATA[赢在IT，Playin' with IT,Focus on Killer Application,Marketing Meets Technology.]]></description> 
<language>zh-cn</language> 
<copyright><![CDATA[向东博客 专注WEB应用 构架之美 --- 构架之美，在于尽态极妍 | 应用之美，在于药到病除]]></copyright>
<item>
<link>http://www.jackxiang.com/post/1697/</link>
<title><![CDATA[[404判断]CURL处理返回page not found 404页面的问题及判断。]]></title> 
<author>jack &lt;xdy108@126.com&gt;</author>
<category><![CDATA[WEB2.0]]></category>
<pubDate>Thu, 02 Apr 2009 10:26:00 +0000</pubDate> 
<guid>http://www.jackxiang.com/post/1697/</guid> 
<description>
<![CDATA[ 
	背景：外包写图片抓取时出现404内容，但就把nginx里的输出给保存在了jpg里，再读取时导致出现：图片没法显示，里面内容是404。<br/><br/>用curl抓取页面时，一般根据curl_exec的返回内容判断是否抓取成功了。但我发现，访问有些站点本来是返回404错误，但页面有内容时，curl把page not found的内容也抓回来了。如果以curl_exec的结果判断是否正确抓取就被误导了。如下面的代码：<br/><br/><div class="code">$url = &#039;http://www.cq.xinhuanet.com/house/2008-11/24/content_14996426.htm-&#039;;<br/>$ch = curl_init();<br/>curl_setopt($ch, CURLOPT_URL, $url);<br/>curl_setopt($ch, CURLOPT_ENCODING, &quot;gzip, deflate&quot;);<br/>curl_setopt($ch, CURLOPT_USERAGENT, &quot;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; CIBA; InfoPath.1; .NET CLR 2.0.50727)&quot;);<br/>curl_setopt($ch, CURLOPT_MAXREDIRS, 5);<br/>curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); //自动跟踪location<br/>curl_setopt($ch, CURLOPT_TIMEOUT, 10); //Timeout<br/>curl_setopt($ch, CURLOPT_HEADER, 1);<br/>//curl_setopt($ch, CURLOPT_NOBODY, 0);<br/>curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);<br/><br/>$contents = curl_exec($ch);<br/>curl_close($ch);<br/><br/>if (false == $contents &#124;&#124; empty($contents)) &#123;<br/>echo $contents;<br/>&#125; else &#123;<br/>echo “抓取页面失败！”;<br/>&#125;</div><br/><br/>查了下手册，发现curl里还有个curl_getinfo函数。应该判断http状态：<br/><br/><div class="code">$contents = curl_exec($ch);<br/>$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);<br/>if ($http_code &gt;= 400) &#123; //400 - 600都是服务器错误<br/>echo &quot;访问失败！&quot;;<br/>exit;<br/>&#125; else &#123;<br/>echo $contents;<br/>&#125;<br/>curl_close($ch);<br/></div><br/><br/><br/>新加网上找了一个：<br/><textarea name="code" class="php" rows="15" cols="100">
&lt;?php
$curl = curl_init();
$url=&quot;http://p1.img.cctvpic.com/xiyou/userimage/space/auth/2014/01/13/20140113-224759-752837.jpg&quot;;
curl_setopt($curl, CURLOPT_URL, $url); //设置URL
curl_setopt($curl, CURLOPT_HEADER, 1); //获取Header
curl_setopt($curl,CURLOPT_NOBODY,true); //Body就不要了吧，我们只是需要Head
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); //数据存到成字符串吧，别给我直接输出到屏幕了
$data = curl_exec($curl); //开始执行啦～
$flag = curl_getinfo($curl,CURLINFO_HTTP_CODE); //我知道HTTPSTAT码哦～
if($flag == 404)&#123;
&nbsp;&nbsp;echo &quot;zhaobudao&quot;;
&#125;
curl_close($curl); 
</textarea><br/>Add Time：2014-01-15<br/><br/>
]]>
</description>
</item><item>
<link>http://www.jackxiang.com/post/1697/#blogcomment63996</link>
<title><![CDATA[[评论] [404判断]CURL处理返回page not found 404页面的问题及判断。]]></title> 
<author>FUCK YOU &lt;user@domain.com&gt;</author>
<category><![CDATA[评论]]></category>
<pubDate>Thu, 09 Dec 2021 22:16:33 +0000</pubDate> 
<guid>http://www.jackxiang.com/post/1697/#blogcomment63996</guid> 
<description>
<![CDATA[ 
	fuck you
]]>
</description>
</item>
</channel>
</rss>