1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
| import json import random import requests import dbconnect
citylist = [["上海", "b459596e9d3b11ec86d518dbf22ff9d5b45959c49d3b11ec86d518dbf22ff9d5"], ["北京", "b459a1299d3b11ec86d518dbf22ff9d5b459a12f9d3b11ec86d518dbf22ff9d5"], ["广州", "b459e6f69d3b11ec86d518dbf22ff9d5b459e6fc9d3b11ec86d518dbf22ff9d5"], ["深圳", "b45a32ba9d3b11ec86d518dbf22ff9d5b45a32bf9d3b11ec86d518dbf22ff9d5"], ["天津", "b45a74d69d3b11ec86d518dbf22ff9d5b45a74dc9d3b11ec86d518dbf22ff9d5"], ["杭州", "b45ab9f19d3b11ec86d518dbf22ff9d5b45ab9f79d3b11ec86d518dbf22ff9d5"], ["南京", "b45b09869d3b11ec86d518dbf22ff9d5b45b09909d3b11ec86d518dbf22ff9d5"], ["苏州", "b45b60989d3b11ec86d518dbf22ff9d5b45b609d9d3b11ec86d518dbf22ff9d5"], ["成都", "b45bc2779d3b11ec86d518dbf22ff9d5b45bc27e9d3b11ec86d518dbf22ff9d5"], ["武汉", "b45c25479d3b11ec86d518dbf22ff9d5b45c254f9d3b11ec86d518dbf22ff9d5"], ["重庆", "b45c8f159d3b11ec86d518dbf22ff9d5b45c8f1e9d3b11ec86d518dbf22ff9d5"], ["西安", "b45cd2609d3b11ec86d518dbf22ff9d5b45cd2669d3b11ec86d518dbf22ff9d5"], ["青岛", "b45d14b89d3b11ec86d518dbf22ff9d5b45d14bd9d3b11ec86d518dbf22ff9d5"], ["济南", "b45d56239d3b11ec86d518dbf22ff9d5b45d56289d3b11ec86d518dbf22ff9d5"], ["威海", "b45db0de9d3b11ec86d518dbf22ff9d5b45db0e49d3b11ec86d518dbf22ff9d5"], ["长春", "b45e04259d3b11ec86d518dbf22ff9d5b45e042b9d3b11ec86d518dbf22ff9d5"], ["大连", "b45e459c9d3b11ec86d518dbf22ff9d5b45e45a19d3b11ec86d518dbf22ff9d5"], ["佛山", "b45e89e49d3b11ec86d518dbf22ff9d5b45e89e99d3b11ec86d518dbf22ff9d5"], ["贵阳", "b45ee1b49d3b11ec86d518dbf22ff9d5b45ee1ba9d3b11ec86d518dbf22ff9d5"], ["合肥", "b45f24b39d3b11ec86d518dbf22ff9d5b45f24b99d3b11ec86d518dbf22ff9d5"], ["呼和浩特", "b45f682a9d3b11ec86d518dbf22ff9d5b45f682f9d3b11ec86d518dbf22ff9d5"], ["昆明", "b45fac4e9d3b11ec86d518dbf22ff9d5b45fac549d3b11ec86d518dbf22ff9d5"], ["兰州", "b45fef599d3b11ec86d518dbf22ff9d5b45fef5f9d3b11ec86d518dbf22ff9d5"], ["南宁", "b46048ab9d3b11ec86d518dbf22ff9d5b46048b09d3b11ec86d518dbf22ff9d5"], ["秦皇岛", "b4608a8c9d3b11ec86d518dbf22ff9d5b4608a919d3b11ec86d518dbf22ff9d5"], ["沈阳", "b460ce4a9d3b11ec86d518dbf22ff9d5b460ce509d3b11ec86d518dbf22ff9d5"], ["太原", "b461113f9d3b11ec86d518dbf22ff9d5b46111449d3b11ec86d518dbf22ff9d5"], ["唐山", "b46152c49d3b11ec86d518dbf22ff9d5b46152ca9d3b11ec86d518dbf22ff9d5"], ["无锡", "b46195ab9d3b11ec86d518dbf22ff9d5b46195b09d3b11ec86d518dbf22ff9d5"], ["扬州", "b461d8799d3b11ec86d518dbf22ff9d5b461d87e9d3b11ec86d518dbf22ff9d5"] ]
USER_AGENT_LIST = [ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1", "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6", "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5"]
head = { 'User-Agent': '{0}'.format(random.sample(USER_AGENT_LIST, 1)[0]) } f = 0 c = 0
def cfindinfo(city,data): global f,c dbcon = dbconnect.DBConnect() dbcon.connectDatabase() for idata in json.loads(data)["shopBeans"]: f +=1 ishopName = idata["shopName"] ishopId = idata["shopId"] ishopPower = idata["shopPower"] imainRegionName = idata["mainRegionName"] imainCategoryName = idata["mainCategoryName"] itasteScore = idata["score1"] ienvironmentScore = idata["score2"] iserviceScore = idata["score3"] iavgPrice = idata["avgPrice"] ishopAddress = idata["address"] ishopUrl = "http://192.168.232.132/shop/"+ishopId idefaultPic = idata["defaultPic"] sql = '''insert into dazhonginfo(city, shopName, shopId, shopPower, mainRegionName, mainCategoryName, tasteScore, environmentScore, serviceScore, avgPrice, shopAddress, shopUrl, defaultPic) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)''' params = (city, ishopName, ishopId, ishopPower, imainRegionName, imainCategoryName, itasteScore, ienvironmentScore, iserviceScore, iavgPrice, ishopAddress, ishopUrl, idefaultPic) try: dbcon.insert(sql,*params) c +=1 print("----- 插入:", c, "条------") except: print("已存在不再重复插入!!") print("总条数:", f)
def cinfoSpider(clist): city = clist[0] url = clist[1] cbase_url = " http://192.168.232.132/mylist/ajax/shoprank?rankId="+url html = requests.get(cbase_url, headers=head) cfindinfo(city=city, data=str(html.text)) if __name__ == '__main__': with open("city.json", "r") as file: my_list = json.load(file) for cdata in my_list: cinfoSpider(cdata)
|