








{"id":337,"date":"2023-05-03T01:30:19","date_gmt":"2023-05-03T01:30:19","guid":{"rendered":"https:\/\/osoyoo.info\/?p=337"},"modified":"2024-03-17T17:23:11","modified_gmt":"2024-03-17T17:23:11","slug":"need-some-expert-to-help-me-debug-a-python-scraping-code","status":"publish","type":"post","link":"https:\/\/osoyoo.info\/index.php\/2023\/05\/03\/need-some-expert-to-help-me-debug-a-python-scraping-code\/","title":{"rendered":"need some expert to help me debug a Python scraping code"},"content":{"rendered":"<p>I have a Python server to scrap tracking numbers to a website: www.ems.com.cn<\/p>\n<p>It will bypass the captcha in that website, then submit tracking number to target server <a href=\"https:\/\/www.ems.com.cn\">https:\/\/www.ems.com.cn<\/a><br \/>\n<img loading=\"lazy\" class=\"alignnone size-medium\" src=\"http:\/\/osoyoo.info\/upload\/2.png\" width=\"861\" height=\"468\" \/><br \/>\nThen it will get the following result :<br \/>\n<img loading=\"lazy\" class=\"alignnone size-medium\" src=\"http:\/\/osoyoo.info\/upload\/1.png\" width=\"688\" height=\"525\" \/><br \/>\nAbove result is actually a json text string. Python code will send the json code back to user who submitted the tracking number.<\/p>\n<p>In actually program, we will run following command to start this python server:<br \/>\ndocker-compose up -d<\/p>\n<p>&nbsp;<\/p>\n<p>Then client can submit the tracking number by HTTP request, for example we can use curl command to get json response:<\/p>\n<p class=\"p1\"><span class=\"s1\">curl &#8220;http:\/\/server_IP:5000\/ems\/track?tracking_number=LV753481665CN&amp;proxy=1&amp;lang=cn&#8221;<br \/>\n<\/span><\/p>\n<p>It should return a json text which has correct result in Chinese, but now it returns following:<\/p>\n<p class=\"p1\"><span class=\"s1\">{&#8220;success&#8221;:false,&#8221;msg&#8221;:&#8221;Internal Error!&#8221;,&#8221;error&#8221;:&#8221;Expecting value: line 1 column 1 (char 0)&#8221;}<\/span><\/p>\n<p>You must be familiar with following libraries:<br \/>\nimport hashlib<br \/>\nimport base64<br \/>\nimport json, random, traceback<br \/>\nimport requests<br \/>\nfrom requests.exceptions import ProxyError<br \/>\nfrom PIL import Image<br \/>\nfrom io import BytesIO<br \/>\nimport cv2<br \/>\nimport numpy as np<\/p>\n<p>Please do NOT use headless scraping libraries such as Selenium or similar, headless library has too much memory cost and will crash my server. Also do NOT use paid captcha bypass service, we have too many requests, paid service is too expensive for us.<\/p>\n<p>&nbsp;<\/p>\n<p>For testing purpose, please use following tracking number to test www.ems.com.cn<\/p>\n<p class=\"p1\"><span class=\"s1\"><br \/>\nLV957478336CN<br \/>\n<\/span><\/p>\n<p>LP717780070CN<\/p>\n<p>LV967714391CN<\/p>\n<p>original code is here:<br \/>\n<a href=\"https:\/\/osoyoo.info\/upload\/PostAPI.tar.gz\">https:\/\/osoyoo.info\/upload\/PostAPI.tar.gz<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have a Python server to scrap tracking numbers to a website: www.ems.com.cn It will bypass the captcha in that website, then submit tracking number to target server https:\/\/www.ems.com.cn Then it will get the following result : Above result is actually a json text string. Python code will send the json code back to user [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/osoyoo.info\/index.php\/wp-json\/wp\/v2\/posts\/337"}],"collection":[{"href":"https:\/\/osoyoo.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/osoyoo.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/osoyoo.info\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/osoyoo.info\/index.php\/wp-json\/wp\/v2\/comments?post=337"}],"version-history":[{"count":6,"href":"https:\/\/osoyoo.info\/index.php\/wp-json\/wp\/v2\/posts\/337\/revisions"}],"predecessor-version":[{"id":344,"href":"https:\/\/osoyoo.info\/index.php\/wp-json\/wp\/v2\/posts\/337\/revisions\/344"}],"wp:attachment":[{"href":"https:\/\/osoyoo.info\/index.php\/wp-json\/wp\/v2\/media?parent=337"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/osoyoo.info\/index.php\/wp-json\/wp\/v2\/categories?post=337"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/osoyoo.info\/index.php\/wp-json\/wp\/v2\/tags?post=337"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}