I’m the administrator of kbin.life, a general purpose/tech orientated kbin instance.

  • 0 Posts
  • 435 Comments
Joined 2 years ago
cake
Cake day: June 29th, 2023

help-circle


  • Why? Because you can. But in terms of useful reasons?

    Cellphones, Internet they need infrastructure to work, and that can be disabled either during a natural disaster or war situation. Even by your own government in some cases.

    But if I want to communicate, I just need a piece of wire, somewhere to hang it, and a 12v battery and I can communicate for thousands of miles.

    Personally I just think that’s cool.






  • So on my mbin instance, it’s on cloudflare. So I filter the AS numbers there. Don’t even reach my server.

    On the sites that aren’t behind cloudflare. Yep it’s on the nginx level. I did consider firewall level. Maybe just make a specific chain for it. But since I was blocking at the nginx level I just did it there for now. I mean it keeps them off the content, but yes it does tell them there’s a website there to leech if they change their tactics for example.

    You need to block the whole ASN too. Those that are using chrome/firefox UAs change IP every 5 minutes from a random other one in their huuuuuge pools.


  • Yeah, I probably should look to see if there’s any good plugins that do this on some community submission basis. Because yes, it’s a pain to keep up with whatever trick they’re doing next.

    And unlike web crawlers that generally check a url here and there, AI bots absolutely rip through your sites like something rabid.


  • If you’re running nginx I am using the following:

    if ($http_user_agent ~* "SemrushBot|Semrush|AhrefsBot|MJ12bot|YandexBot|YandexImages|MegaIndex.ru|BLEXbot|BLEXBot|ZoominfoBot|YaK|VelenPublicWebCrawler|SentiBot|Vagabondo|SEOkicks|SEOkicks-Robot|mtbot/1.1.0i|SeznamBot|DotBot|Cliqzbot|coccocbot|python|Scrap|SiteCheck-sitecrawl|MauiBot|Java|GumGum|Clickagy|AspiegelBot|Yandex|TkBot|CCBot|Qwantify|MBCrawler|serpstatbot|AwarioSmartBot|Semantici|ScholarBot|proximic|GrapeshotCrawler|IAScrawler|linkdexbot|contxbot|PlurkBot|PaperLiBot|BomboraBot|Leikibot|weborama-fetcher|NTENTbot|Screaming Frog SEO Spider|admantx-usaspb|Eyeotabot|VoluumDSP-content-bot|SirdataBot|adbeat_bot|TTD-Content|admantx|Nimbostratus-Bot|Mail.RU_Bot|Quantcastboti|Onespot-ScraperBot|Taboolabot|Baidu|Jobboerse|VoilaBot|Sogou|Jyxobot|Exabot|ZGrab|Proximi|Sosospider|Accoona|aiHitBot|Genieo|BecomeBot|ConveraCrawler|NerdyBot|OutclicksBot|findlinks|JikeSpider|Gigabot|CatchBot|Huaweisymantecspider|Offline Explorer|SiteSnagger|TeleportPro|WebCopier|WebReaper|WebStripper|WebZIP|Xaldon_WebSpider|BackDoorBot|AITCSRoboti|Arachnophilia|BackRub|BlowFishi|perl|CherryPicker|CyberSpyder|EmailCollector|Foobot|GetURL|httplib|HTTrack|LinkScan|Openbot|Snooper|SuperBot|URLSpiderPro|MAZBot|EchoboxBot|SerendeputyBot|LivelapBot|linkfluence.com|TweetmemeBot|LinkisBot|CrowdTanglebot|ClaudeBot|Bytespider|ImagesiftBot|Barkrowler|DataForSeoBo|Amazonbot|facebookexternalhit|meta-externalagent|FriendlyCrawler|GoogleOther|PetalBot|Applebot") { return 403; }

    That will block those that actually use recognisable user agents. I add any I find as I go on. It will catch a lot!

    I also have a huuuuuge IP based block list (generated by adding all ranges returned from looking up the following AS numbers):

    AS45102 (Alibaba cloud) AS136907 (Huawei SG) AS132203 (Tencent) AS32934 (Facebook)

    Since these guys run or have run bots that impersonate real browser agents.

    There are various tools online to return prefix/ip lists for an autonomous system number.

    I put both into a single file and include it into my web site config files.

    EDIT: Just to add, keeping on top of this is a full time job! EDIT 2: Removed Mojeek bot as it seems to be a normal web crawler.


  • Not sure how it is in the US. But here in the UK there’s two ways a business can export.

    1: They pre-clear the customs duty and include it in the sales total (so it’s like paying sales tax at the checkout, except it’s the pre-cleared duty fees). Then the parcel has a nice duty paid stamp and goes straight through customs (I guess unless customs are suspicious and check into it).

    2: They just charge you the item price with no tax applied. In which case you need to pay local tax and duties applicable once the product arrives. Here it’s a bit different. They will hold it at the local depot and you can either go there and pay + collect, or you can pay online and it will be rescheduled for delivery once you pay.

    As others have said, it’s not a scam. There’s no requirement for a business to do option 1, and it’s likely only viable for large businesses to register and have someone/software that knows the various duties required for various countries.

    I’ve ordered from newegg and B&M in the past for example, and in both cases the items were pre-cleared and arrived promptly without any hassle.

    Maybe there’s something similar for imports into the US too?







  • Now see, I kinda had the idea for a syndicated delivery service (not online orders, but the internet would have been used to create the order data that would assign drivers) decades ago. I did some part time work delivering food back in the late 90s/early 2000s, and I always thought it was so inefficient. The place I was at, was very busy, he had a very large delivery area but even so. There would be times he was paying people to sit outside talking shit to eachother in their cars.

    I thought it would make sense to have a larger pool of drivers that service multiple restaurants/take-aways. Adding the economies of scale to the problem to ensure that people were being utilised and lowering the cost to each place using the service. Of course also paying some money to the person running the business that brought it all together.

    I don’t think I ever considered paying less than this guy did (which wasn’t a lot, but would likely translate to $5 or so an hour in the 90s/2000s).

    One thing I find really interesting about uber eats/door dash (US)/Deliveroo (UK/EU). When you add up their fees, they take a delivery fee from the user, a service fee from the user, an even bigger service fee from the restaurant and pay the lowest possible fee that will keep drivers interested. Yet I always hear the services are losing money too. How is that even possible?

    Take deliveroo in the UK. Looking now I can see (I don’t live in a city, so most places are some distance away). A place 4.5 miles away is charging £4.29 for delivery. Let’s make up an imaginary order:

    Order total: £20 (including sales tax/VAT) User’s service fee: £2.39 (it seems to be 11% including the VAT with a maximum set of which I am not sure how much) User’s delivery fee: £4.29 (including VAT, since they need to charge VAT on a service) Restaurant service fee: £6 (30% on the VAT included total). I am really unsure how this works entirely in terms of tax though… Total for user: £26.68


    Total deliveroo service revenue: Net: £10.57 VAT: £2.11 Total: £12.68

    Reading between the lines from what I can see delivery riders are paid between £3 and £6 per delivery. Now, in the cities this is probably great. I do wonder how they do it in the towns and villages. When I look at the list of places available to me most are 3 miles or more away, with some up to 6 miles away. I do wonder how £6 compensates someone doing a 10+ mile round trip at times.

    But OK the price they pay drivers doesn’t include any tax. So it comes from the Net total. This means per delivery in revenue they are always making £4.50 or more per delivery.

    Yes, they need to pay support staff, but they are in low cost geographies. Yes, they need to keep development staff and the usual management overhead And yes, they need servers/cloud time to host this stuff.

    Looking this up (not sure how good the source is) their revenue in 2023 was £2.7billion, which I believe. However they lost £38million. Where all the costs come from, I am not sure.

    I wonder how these numbers compare to US based operators?


  • I think it’ll be a “we’ll see” situation. This was the main concern for y2k. And I don’t doubt there’s some stuff that was partially patched from y2k still around that is still using string dates.

    But the vast majority of software now works with timestamps and of course some things will need work. But with y2k the vast majority of business software needed changing. I think in this case the vast majority will be working correctly already and it’ll be the job of developers (probably in a panic less than a year before as is the custom) too catch the few outliers and yes some will escape through the cracks. But that was also the case last time round too.


  • You’re right on every point. But, I’m not sure how that goes against what I said.

    Most applications now use the epoch for date and time storage, and for the 2038 problem the issues will be down to making sure either tiime_t or 64bit long values (and matching storage) which will be a much smaller change then was the case for y2k. Since more people also use libraries for date and time handling it’s also likely this will be handled.

    Most databases have datetime types which again are almost certainly already ready for 2038.

    I just don’t think the scale is going to be close to the same.


  • Not really processor based. The timestamp needs to be ulong (not advised but good for date ranges up to something like 2100, but cannot express dates before 1970). Or llong (long long). I think it’s a bad idea but I bet some people too lazy to change their database schema will just do this internally.

    The type time_t in Linux is now 64bit regardless. So, compiling applications that used that will be fine. Of course it’s a problem if the database is storing 32bit signed integers. The type on the database can be changed too and this isn’t hard really.

    As for the Y10K problem. It will almost entirely only be formatting problems I think. In the 80s and 90s, storage was at a premium, databases were generally much simpler and as such dates were very often stored as YYMMDD. There also wasn’t so much use of standard libraries. So this meant that to fix the Y2K problem required quite some work. In some cases there wasn’t time to make a proper solution. Where I was working there was a two step solution.

    One team made the interim change to adjust where all dates were read and evaluate anything <30 (it wasn’t 30, it was another number but I forget which) to be 2000+number and anything else 1900+number. This meant the existing product would be fine for another 30 years or so.

    The other team was writing the new version of the software, which used MSSQL server as a back-end, with proper datetime typed columns and worked properly with years before and after 2000.

    I suspect this wasn’t unusual in terms of approach and most software is using some form of epoch datatype which should be fine in terms of storing, reading and writing dates beyond Y10K. But some hard-coded date format strings will need to be changed.

    Source: I was there, 3000 years ago.