Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "scrape"
-
That wasn't so hard :D
I managed to scrape all the images using my Java api in a couple of lines of code, used Apache Fluent to quickly download all of them, and imgflip to turn them into a gif.
Credit for the original rant and idea goes to @linuxer4fun
https://www.devrant.io/rants/42285015 -
!Story
The day I became the 400 pound Chinese hacker 4chan.
I built this front-end solution for a client (but behind a back end login), and we get on the line with some fancy European team who will handle penetration testing for the client as we are nearing dev completion.
They seem... pretty confident in themselves, and pretty disrespectful to the LAMP environment, and make the client worry even though it's behind a login the project is still vulnerable. No idea why the client hired an uppity .NET house to test a LAMP app. I don't even bother asking these questions anymore...
And worse, they insist we allow them to scrape for vulnerabilities BEHIND the server side login. As though a user was already compromised.
So, I know I want to fuck with them. and I sit around and smoke some weed and just let this issue marinate around in my crazy ass brain for a bit. Trying to think of a way I can obfuscate all this localStorage and what it's doing... And then, inspiration strikes.
I know this library for compressing JSON. I only use it when localStorage space gets tight, and this project was only storing a few k to localStorage... so compression was unnecessary, but what the hell. Problem: it would be obvious from exposed source that it was being called.
After a little more thought, I decide to override the addslashes and stripslashes functions and to do the compression/decompression from within those overrides.
I then minify the whole thing and stash it in the minified jquery file.
So, what LOOKS from exposed client side code to be a simple addslashes ends up compressing the JSON before putting it in localStorage. And what LOOKS like a stripslashes decompresses.
Now, the compression does some bit math that frankly is over my head, but the practical result is if you output the data compressed, it looks like mandarin and random characters. As a result, everything that can be seen in dev tools looks like the image.
So we GIVE the penetration team login credentials... they log in and start trying to crack it.
I sit and wait. Grinning as fuck.
Not even an hour goes by and they call an emergency meeting. I can barely contain laughter.
We get my PM and me and then several guys from their team on the line. They share screen and show the dev tools.
"We think you may have been compromised by a Chinese hacker!"
I mute and then die my ass off. Holy shit this is maybe the best thing I've ever done.
My PM, who has seen me use the JSON compression technique before and knows exactly whats up starts telling them about it so they don't freak out. And finally I unmute and manage a, "Guys... I'm standing right here." between gasped laughter.
If only it was more common to use video in these calls because I WISH I could have seen their faces.
Anyway, they calmed their attitude down, we told them how to decompress the localStorage, and then they still didn't find jack shit because i'm a fucking badass and even after we gave them keys to the login and gave them keys to my secret localStorage it only led to AWS Cognito protected async calls.
Anyway, that's the story of how I became a "Chinese hacker" and made a room full of penetration testers look like morons with a (reasonably) simple JS trick.9 -
What's your story when you had to scrape your code, for the better, when you went working for days and had to remove it all?6
-
3 rants for the price of 1, isn't that a great deal!
1. HP, you braindead fucking morons!!!
So recently I disassembled this HP laptop of mine to unfuck it at the hardware level. Some issues with the hinge that I had to solve. So I had to disassemble not only the bottom of the laptop but also the display panel itself. Turns out that HP - being the certified enganeers they are - made the following fuckups, with probably many more that I didn't even notice yet.
- They used fucking glue to ensure that the bottom of the display frame stays connected to the panel. Cheap solution to what should've been "MAKE A FUCKING DECENT FRAME?!" but a royal pain in the ass to disassemble. Luckily I was careful and didn't damage the panel, but the chance of that happening was most certainly nonzero.
- They connected the ribbon cables for the keyboard in such a way that you have to reach all the way into the spacing between the keyboard and the motherboard to connect the bloody things. And some extra spacing on the ribbon cables to enable servicing with some room for actually connecting the bloody things easily.. as Carlos Mantos would say it - M-m-M, nonoNO!!!
- Oh and let's not forget an old flaw that I noticed ages ago in this turd. The CPU goes straight to 70°C during boot-up but turning on the fan.. again, M-m-M, nonoNO!!! Let's just get the bloody thing to overheat, freeze completely and force the user to power cycle the machine, right? That's gonna be a great way to make them satisfied, RIGHT?! NO MOTHERFUCKERS, AND I WILL DISCONNECT THE DATA LINES OF THIS FUCKING THING TO MAKE IT SPIN ALL THE TIME, AS IT SHOULD!!! Certified fucking braindead abominations of engineers!!!
Oh and not only that, this laptop is outperformed by a Raspberry Pi 3B in performance, thermals, price and product quality.. A FUCKING SINGLE BOARD COMPUTER!!! Isn't that a great joke. Someone here mentioned earlier that HP and Acer seem to have been competing for a long time to make the shittiest products possible, and boy they fucking do. If there's anything that makes both of those shitcompanies remarkable, that'd be it.
2. If I want to conduct a pentest, I don't want to have to relearn the bloody tool!
Recently I did a Burp Suite test to see how the devRant web app logs in, but due to my Burp Suite being the community edition, I couldn't save it. Fucking amazing, thanks PortSwigger! And I couldn't recreate the results anymore due to what I think is a change in the web app. But I'll get back to that later.
So I fired up bettercap (which works at lower network layers and can conduct ARP poisoning and DNS cache poisoning) with the intent to ARP poison my phone and get the results straight from the devRant Android app. I haven't used this tool since around 2017 due to the fact that I kinda lost interest in offensive security. When I fired it up again a few days ago in my PTbox (which is a VM somewhere else on the network) and today again in my newly recovered HP laptop, I noticed that both hosts now have an updated version of bettercap, in which the options completely changed. It's now got different command-line switches and some interactive mode. Needless to say, I have no idea how to use this bloody thing anymore and don't feel like learning it all over again for a single test. Maybe this is why users often dislike changes to the UI, and why some sysadmins refrain from updating their servers? When you have users of any kind, you should at all times honor their installations, give them time to change their individual configurations - tell them that they should! - in other words give them a grace time, and allow for backwards compatibility for as long as feasible.
3. devRant web app!!
As mentioned earlier I tried to scrape the web app's login flow with Burp Suite but every time that I try to log in with its proxy enabled, it doesn't open the login form but instead just makes a GET request to /feed/top/month?login=1 without ever allowing me to actually log in. This happens in both Chromium and Firefox, in Windows and Arch Linux. Clearly this is a change to the web app, and a very undesirable one. Especially considering that the login flow for the API isn't documented anywhere as far as I know.
So, can this update to the web app be rolled back, merged back to an older version of that login flow or can I at least know how I'm supposed to log in to this API in order to be able to start developing my own client?6 -
I wrote an azure web job to screen scrape a specific page of disneystore.com and send me an email if an Elsa dress was in stock so I could buy it for my daughter. This dress would be available for literally seconds at a time.4
-
It was 1999. I was just starting my first real job as a programmer for a major insurance company. We were working on code that would screen scrape legacy mainframe data output and convert it to a web-based UI. REALLY stupid project approach I had no input on. I happened to find a programmer in Germany who had released his code in the public domain that would help with making a certain conversion task easier. I downloaded his code and put it to work.
During a code review, a programmer who was probably about 60 asked me where I got the code and what it was doing. I didn't even get to the part about what it was doing because he made fun of me so badly, in a fake German accent in front of a room full of non-programmers, for using code that today is no big deal due to the prevalence of open source. I just clammed up in humiliation because he got everyone laughing at me. His philosophy was if we didn't buy it or write it ourselves, we had no business using it.
I guess I was just ahead of my time?6 -
Okay, story time.
Back during 2016, I decided to do a little experiment to test the viability of multithreading in a JavaScript server stack, and I'm not talking about the Node.js way of queuing I/O on background threads, or about WebWorkers that box and convert your arguments to JSON and back during a simple call across two JS contexts.
I'm talking about JavaScript code running concurrently on all cores. I'm talking about replacing the god-awful single-threaded event loop of ECMAScript – the biggest bottleneck in software history – with an honest-to-god, lock-free thread-pool scheduler that executes JS code in parallel, on all cores.
I'm talking about concurrent access to shared mutable state – a big, rightfully-hated mess when done badly – in JavaScript.
This rant is about the many mistakes I made at the time, specifically the biggest – but not the first – of which: publishing some preliminary results very early on.
Every time I showed my work to a JavaScript developer, I'd get negative feedback. Like, unjustified hatred and immediate denial, or outright rejection of the entire concept. Some were even adamantly trying to discourage me from this project.
So I posted a sarcastic question to the Software Engineering Stack Exchange, which was originally worded differently to reflect my frustration, but was later edited by mods to be more serious.
You can see the responses for yourself here: https://goo.gl/poHKpK
Most of the serious answers were along the lines of "multithreading is hard". The top voted response started with this statement: "1) Multithreading is extremely hard, and unfortunately the way you've presented this idea so far implies you're severely underestimating how hard it is."
While I'll admit that my presentation was initially lacking, I later made an entire page to explain the synchronisation mechanism in place, and you can read more about it here, if you're interested:
http://nexusjs.com/architecture/
But what really shocked me was that I had never understood the mindset that all the naysayers adopted until I read that response.
Because the bottom-line of that entire response is an argument: an argument against change.
The average JavaScript developer doesn't want a multithreaded server platform for JavaScript because it means a change of the status quo.
And this is exactly why I started this project. I wanted a highly performant JavaScript platform for servers that's more suitable for real-time applications like transcoding, video streaming, and machine learning.
Nexus does not and will not hold your hand. It will not repeat Node's mistakes and give you nice ways to shoot yourself in the foot later, like `process.on('uncaughtException', ...)` for a catch-all global error handling solution.
No, an uncaught exception will be dealt with like any other self-respecting language: by not ignoring the problem and pretending it doesn't exist. If you write bad code, your program will crash, and you can't rectify a bug in your code by ignoring its presence entirely and using duct tape to scrape something together.
Back on the topic of multithreading, though. Multithreading is known to be hard, that's true. But how do you deal with a difficult solution? You simplify it and break it down, not just disregard it completely; because multithreading has its great advantages, too.
Like, how about we talk performance?
How about distributed algorithms that don't waste 40% of their computing power on agent communication and pointless overhead (like the serialisation/deserialisation of messages across the execution boundary for every single call)?
How about vertical scaling without forking the entire address space (and thus multiplying your application's memory consumption by the number of cores you wish to use)?
How about utilising logical CPUs to the fullest extent, and allowing them to execute JavaScript? Something that isn't even possible with the current model implemented by Node?
Some will say that the performance gains aren't worth the risk. That the possibility of race conditions and deadlocks aren't worth it.
That's the point of cooperative multithreading. It is a way to smartly work around these issues.
If you use promises, they will execute in parallel, to the best of the scheduler's abilities, and if you chain them then they will run consecutively as planned according to their dependency graph.
If your code doesn't access global variables or shared closure variables, or your promises only deal with their provided inputs without side-effects, then no contention will *ever* occur.
If you only read and never modify globals, no contention will ever occur.
Are you seeing the same trend I'm seeing?
Good JavaScript programming practices miraculously coincide with the best practices of thread-safety.
When someone says we shouldn't use multithreading because it's hard, do you know what I like to say to that?
"To multithread, you need a pair."18 -
this.isRant === True
Salute to everyone who can handle clients (the dumb ones).
So the client I'm freelancing for gives me this website and asks me to scrape entries out of it. It had about 45 items. I did that sent the file. Next day he says my file had the wrong data. He wanted data which satisfies X but the URL given was for Y. The least he could have done was to let me know in the first place instead of giving random URL to scrape and then blame me.2 -
FUCK THE RECRUITERS WHO ASK US TO MAKE AN ENTIRE PROJECT AS A CODE TEST.
Oh you need to scrape this website and then store the data in some DB. Apply sentimental analysis on the data set. On the UI, the user should be able to search the fields that were scraped from the website. Upon clicking it should consume a REST API which you have to create as well. Oh and also deploy it somewhere... Oh I almost forgot, make the UI look good. If you could submit it in one week, we will move towards further rounds if we find you fit enough.
YOU KNOW WHAT, FUCK YOU!
I can apply to 10 others companies in one week and get hired in half the effort than making this whole project for you which you are going to use it on your website YOU SADIST MOTHERFUCK
I CURSE YOUR COMPANY WITH THE ETERNITY OF JS CALLBACK HELL 😡😤😣9 -
Client : I have a scraping project for you...
Me : Yeah tell me which site you want me to scrape and what data from it?
Client : I want you to scrape data from 500 sites
Me : 500 sites...are you serious?
Client : Yeah 500 sites...can you do the job?
Me : ok...for 500 sites...the charge will be $500...
Client : Are you out of your mind? $500 for just 500 sites...I can only give you $5019 -
Never update the firmware of your delta-fan driven server when your girlfriend is sleeping. Got thrown out of my own room!
Fml9 -
I've run into problems with the app I'm working on, the problems are related to issues regarding code.
No in fact it's related the last guy who wrote the app, the code has no comments and the variable names make no sense, the only comments in the code are blocks of code... With no reason as to why it was commented.
I have to add in some checks to determine if a person that has logged in is a full member or not (full member has access to the feature I've added) and the way the guy has made this app works makes no sense to me at all.
I've tried my best to avoid all contact with his code because it makes me want to yell out in frustration.
But for this one case I have to work with what's there.
I know I've mentioned this before but I've hit my limit yet again.
And for those who don't know this guy managed to scrape together skeleton code from two apps to make part of this app, rather than using parts of the other apps he left out code that was specifically made for the other apps, (majority of the commented out code).
One app was a taxi app and from the looks of it the feature he used was to get GPS location (which I don't understand before Google maps is a think after all... The taxi app USES Google maps), the other app is some sort of funeral webcasting app (I found code imports for it, without any actual code).
I don't actual understand how this guy could put this together without not thinking "maybe this is a bad idea"
Always code as if the person who ends up maintaining your code is a violent psychopath who knows where you live
I'm that psychopath right now..... Fuck that guy (don't know where he lives though)1 -
me: “Realistically, the only way to pull in this data without replicating and without an API feed is to scrape it from the site”
manager -> to the client: “basically he’s got to hack your system to do it”2 -
Client : We need real time analysis.
Me : But we can't just scrape thousands of results and process them on user's click.
Client : Don't do that, Real-time analysis is scraping it once and processing it everytime the user demands.
Me : Okay
WHAT THE FUCK !!!!!7 -
I used to do audits for private companies with a team. Most of them where black box audits and we were allowed to physically manipulate certain machines in and around the building, as long as we could get to them unnoticed.
Usually when doing such jobs, you get a contract signed by the CEO or the head of security stating that if you're caught, and your actions were within the scope of the audit, no legal action will be taken against you.
There was this one time a company hired us to test their badge system, and our main objective was to scrape the data on the smartcards with a skimmer on the scanner at the front of the building.
It's easy to get to as it's outside and almost everyone has to scan their card there in order to enter the building. They used ISO 7816 cards so we didn't even really need specified tools or hardware.
Now, we get assigned this task. Seems easy enough. We receive the "Stay-out-of-jail"-contract signed by the CEO for Company xyz. We head to the address stated on the contract, place the skimmer etc etc all good.
One of our team gets caught fetching the data from the skimmer a week later (it had to be physically removed). Turns out: wrong Building, wrong company. This was a kind of "building park" (don't really know how to say it in English) where all the buildings looked very similar. The only difference between them was the streetnumber, painted on them in big. They gave us the wrong address.
I still have nightmares about this from time to time. In the end, because the collected data was never used and we could somewhat justify our actions because we had that contract and we had the calls and mails with the CEO of xyz. It never came to a lawsuit. We were, and still are pretty sure though that the CEO of xyz himself was very interesed in the data of that other company and sent us out to the wrong building on purpose.
I don't really know what his plan after that would have been though. We don't just give the data to anyone. We show them how they can protect it better and then we erase everything. They don't actually get to see the data.
I quit doing audits some time ago. It's very stressful and I felt like I either had no spare time at all (when having an active assignment) or had nothing but spare time (when not on an assignment). The pay also wasn't that great.
But some people just really are polished turds.4 -
Last year at the the Xmas party CEO slips in that he wants the app done by end of February, I freak because I thought he meant both iOS and Android (only dev working on both :/), anyways he wanted specifics for locking out specific people that haven't paid for some in-house training (like in app persons just not in the app lol) it required web development which I'm horrible at, I spend a whole week and managed to scrape together the right functions to do a user lock out, pretty all things considering.
A couple weeks before deadline I'm done :D, I've done a lot of testing, some in-house user testing, changes made all bugs visually possible are fixed.
Now I've been sitting here waiting, it's an iOS app that is currently completed aside from some legal work, which I kept going to boss "hey, we need that disclaimer and privacy policy", he becomes busy for the next few weeks, pester him more, pester another co-worker, only a week ago did they contact a lawyer...
I'm here stuck waiting at a roadblock, developing the Android app sure but for their iOS app that they want released first, I'm stuck on hold, so annoyed, it's not like I can just put on a lawyer hat and just right some shit that says don't use x unless you agree and such.
So annoying, for about 2 weeks I just played games on my phone, I was not expecting to waste that much time lol, I was really expecting the legal stuff to be ready.
Just a side note co-worker and boss that needed to get this legal stuff knew I needed to get this done, since I mentioned it leading up to my completion.
I don't think it'd take too long with Apple when it comes to the review, it's just an update but I wouldn't put my faith in that as an answer. Just hate that I'm on hold, was wanting to finish this app and apply for a new job (nothing against the company more so because I want to go a company where I could get a but of mentoring). But I sit here waiting, working on the Android app, it'd be sad if finish the Android app before their lawyers get back to me with the legal stuff, though Android is a lot easier for me (I did iOS after completing majority of the features they wanted on Android because I was more comfortable working on it).
:/ What a drag -
I am using VS Codium instead of VS Code to see what kind of things change when you scrape the Microsoft out of it. Apparently some tools for dotnet core like debugging are locked down and only allowed to run in Microsoft made IDEs.
I hate the sneaky Microsoft API lockdown nonsense and will be steering future projects away from any dotnet core development. I thought this was dead in VS Code but they managed to sneak it in.6 -
Harsh truth:
My side SaaS project made more money in its first month (built late winter last year, MVP released after ~3 weeks of development) than the sTaRtUp I work for over its total lifetime so far (built over 3+ months, MVP released in May last year)
...is it time to rage quit?
Often I have dreams of going full-time solo dev, leaving every idiotic, clueless, fumbling clown behind, but I feel like I just don't have the financial runway to do it. However, even from just a few months in 2021 while I was on the job hunt, I created some side revenue streams which I'm still receiving decent revenues from (selling courses, saas products, minor freelancing). I'm just not 100% sure if I was "lucky" during this time period, or if a few more months going at it I'd be able to scrape my way towards a meager (though livable!) income.
Give me biased views, devRant!6 -
In my wallowing experience as a freelancer I've noticed that almost all C/C++ clients are perfectionist. You just can't please them by getting the job done quickly.
I got a libcurl job from one the other day to scrape data from a target website and within an hour it was ready. I notified the client and he was both amazed and confused assuming it would take the whole week.
C++Client: The code works but you need to take your time.
Me: Sorry?
C++Client: Yes, it works but you used "string" instead of "wstring"
Me: 😊 Oh okay... *converts strings to wstring*
C++Client: And also variable names should be more descriptive.
Me: 😏 *int foobar => int very_long_descriptive_foobar_01*
C++Client: And also use "shorts" for page nums it'll save some bytes
Me: 😕 *int => short...*
C++Client: And also use forloops instead of whileloops
Me: ☹️ *whileloops => forloops*
C++Client: And also use -- instead of ++ in loops
Me: 😤 *for(... i++) => for(... i--)*
C++Client: And also...
C++Client: And also...
C++Client: And also...
C++Client: And also...
C++Client: And also...
C++Client: And also...
C++Client: And also...
===> Seven "and also" days later <===
Me: *completed 10 Java projects behind the scene*
C++Client: And also use pthread instead of thread
Me: 😧 It's day 7 already!
C++Client: Oh I see, great job. You can compile and send me the archived source.
Me: 🤩
C++Client: And also...
Me: 🏃💨11 -
2 Things:. Never symlink the root directory and don't try to remove a symlink with rm -rf
Nearly shit my pants today.5 -
Full stack developer.
I know what it's supposed to mean, but I feel like it gives discredit to the devs who perfect their area (frontend, backend, db, infrastructure). It's, to me, like calling myself a chef because I can cook dinner..
The depth, analysis and customization of the domain to shape an api to a website is never appreciated. The finicle tweaks on the frontend to make those final touches. Then comes a brat who say they are full stack, and can do all those things. Bullshit. 99.9% of them have never done anything but move data through layers and present it.
Throw these wannabes an enterprise system with monoliths and microservices willy nelly, orchestrate that shit with a vertical slice nginx ssi with disaster recovery, horizontal scaling, domain modeling, version management, a busy little bus and events flowing all decimal points of 2pi. Then, if you fully master everything going on there, I believe you are full stack.
Otherwise you just scraped the surface of what complexities software development is about. Everyone who can read a tutorial can scrape together an "in-out" website. But if your db is looking the same as your api, your highest complexity is the alignment of an infobox, I will laugh loud at your full stack.
And if you told me in an interview that you are full stack, you'd better have 10+ years experience and a good list of failed and successful projects before I'd let you stay the next two minutes..1 -
Ran a script on production to scrape ~1000 sites continously and update our ~50.000 productions from the data. On the same server as our site was running. Needless to say, with traffic and scraping, our server had almost 100% CPU and ram usage all the time for 2 weeks until I realised my fuckup2
-
So I own a small business that is a licensee of about a few hundred other ones. I wanted a mail list from the corporate office and they wanted to charge me. (we already pay them hundreds if not thousands a month) So I wrote a python script to scrape their website and get the info for free. I love programming!3
-
Why do websites have to make their html so fucking hard and complicated to read with something like beautifulsoup. Like I just want to scrape your data. Fuck your embedded iframes and div lists. Why must you do this I JUST WANT YOUR DATA16
-
Me just now: After two whole days I'm finally able to scrape all the pokemon images from pokemondb and now I can start training my CNN.
Buddy: You know they have whole sets of pokemon images on kaggle all labeled ready to go
Me: -
Me: I've not done this before, so any guess would be pure assumption.
Client: Okay, but still, you would have some idea, right?
Me: It might get done in 3 days or may take even 30.
After 3 days:
Client: But you said that it will be done in 3 days. Now you are saying there MVP is not ready. Do you even know, your part is the most critical one in the project. We believed in you. We trusted you. This is insane. It was a wrong decision to choose you.
Me (in my head): Didn't I say, this is the first time I am trying to scrape Coles? It might take time?
Me (in actual): I understand, it is getting delayed. Am trying to get this up ASAP....
Anyone else experienced toxic clients but still didn't lose their cool?14 -
When my kids would not seriously injure themselves (scrape, bruise, etc) I would suggest we cut it off. They would respond, "NO DAD!"
Now, I go to the doctor because my dentist found something growing on my uvula. My doctor sits me down and says, "Lets cut it off. The whole thing. You will snore less." I am like "...okay!" Then it reminded me of what I used to tell my kids.1 -
I just found out google web dev tools let you copy a request as curl command!
Time to scrape some websites baby!8 -
A government website that I wanted to try and scrape data from to make a better app, I've actually found to be the pinnacle of a demonstration of what NOT to do...
Containing a JavaScript file that not only had got code copied 3 times (changed the tiniest bit on each) for what environment it's on, but has ALSO got the API keys for all 3 environments, AND the APIs they've made it call from there pass FULL SQL right in the query string...
What. The. Actual. Fuck?!5 -
Many years ago, when I moved from a semi-experienced developer to an absolute beginner project manager at another company, my very first project was an absolute clusterfuck.
The customer basically wanted to scrape signups to their EventBrite events into their CRM system. The fuckery began before the project even started, when I was told my management that we HAD to use BizTalk. It didn't matter that we had zero experience with BizTalk, or that using BizTalk for this particular project was like using a stealth bomber to go down to the shops for a bottle of tequila (that's one for fans of Last Man on Earth). It's designed to be used by an experienced team of developers, not a small inexperienced 1-person dev team I had. The reason was for bullshit political reasons which I wasn't really made clear on (I suspect that our sales team sold it to them for a bazillion pounds, and they weren't using it for anything, so we had to justify us selling it to them by doing SOMETHING with it). And because this was literally my first project, I was young and not confident at all, and I wanted to be the guy who just got shit done, I didn't argue.
Inevitably, the project was a turd. It went waaay over budget and time, and didn't work very well. I remember one morning on my way to work seriously considering ploughing my car into a ditch, so that I had a good excuse not to go into work and face that bullshit project.
The good thing is that I learned a lot from that. I decided that kind of fuckery was never going to happen again.
A few months later I had an initial meeting with a potential customer (who I was told would be a great customer to have for bullshit political reasons) - I forget the details but they essentially wanted to build a platform for academic researchers to store data, process it using data processing plugins which they could buy, and commersialise it somehow. There were so many reasons why this was a terrible idea, but when they said that they were dead set on using SharePoint (SharePoint!!!) as the base of the platform, I remembered my first project and what happened.
I politely explained my technical and business concerns over the idea, and reasons why SharePoint was not a good fit (with diagrams and everything), suggested a completely different technology stack, and scheduled another meeting so they could absorb what I had said and revisit. I went to my sales and head of development and basically told them to run. Run fast, and run far, because it won't work, these guys are having some kind of fever dream, it's a clusterfuck in the making, and for some reason they won't consider not using SP.
I never heard from them again, so I assume we dropped them as a potential client. It felt amazing. I think that was the single best thing I did for that company.
Moral of the story: when technology decisions are made which you know are wrong, don't be afraid to stand up and explain why.3 -
Just my luck.
Step 1: Get given university assignment.
Step 2: Scrape for latest movie reviews with provided API.
Step 3: Accidentally spoil Star Wars. -
Ideas:
1. Scrape github
2. Attach feature size estimate (an abstract scale) as examples across many projects.
3. Use this as prompt/finetunning data.
4. Train and prompt on project descriptions relative to feature size and number of contributors/changes in the changelog.
5. Package and release a model that takes descriptions of ideas and generates reasonable estimates of time and manpower.
6. Optional, sell as an estimate service to corporate and make money introducing some sanity to the world for a change.10 -
The garbage recruiters are trying to sell is insane.
Don’t scrape the bottom of the ocean trying to pass barnacles off as salmon!
Just because someone can make computer go “beep boop” -- and you can’t — says more about you then it does about them.
Do they have a single thing in their portfolio that is even a little better than the output of the average “Learn x in y mins” video on youtube? Let that stock simmer for a little longer before you serve it!
Nothing in their portfolio at all you say? They’ve never once written code unless they were forced to? Top talent! Hired!
They scored 80% on your screening test? Wow! My dog scored 90%.
Modern day snake oil peddlers the lot of them.8 -
Backstory:
The webpage for (basically) the only movie theater chain is slow. The app, goddamn, is worse.
So I made an app to scrape the data and save it in a SQLite db for my use. However, there is one theater which doesn't belong to the same company. So I decided to also include it in the app.
But it sucks! I still have to find a way to automatically get the data from their shitty site.6 -
So I just had this thought that nlegs.com (NSFW) kinda feels like a test.
When I first found it, and it still is, the front-end/layout is basically a BootStrap grid.
It was super easy to scrape.
Then over time, the owner made small tweaks and changes which felt like "oh you guys are still here.... let's make it a bit harder and see who drops out next"
So it got more and more tricky to scrape or fool the site.
But it never became completely unfoolable. I figured if he signed up for Cloudflare, that probably make it impossible to scrape....
Well I was curious today so did a whois.... And one of the things it mentioned was Cloudflare...
So now I'm like.... Hmmm.... What???!!! Ok.... ¯\_(ツ)_/¯10 -
Shadow DOMs – the WORST invention in web standard history.
As a user script and user style developer, the shadow DOM has been a massive headache. Shitow DOMs block custom CSS, blocks parts of the page from being saved, and blocks user scripts and browser extensions. Shitow DOMs are an utter nightmare, especially closed ones.
And now, Google Gerrit's entire user interface is shadowdoomed. The only way to save pages locally is to scrape the JSON from the developer tools, but that is not possible on mobile.18 -
Through my previous employer's complete incompetence and lack of a spine I had to work two days during my last holiday. He'd managed to approve time off for all three of the remaining staff at the same time, so as a compromise, our six day work week was covered by all of us for two days a piece. Sooo maybe not technically not coding on holiday?
The business could just about scrape by on one staff member, so the boss should've allowed the holidays to those who requested it first (myself and one other), but that would've caused problems with the third person who he just so happened to be related to.
I was made redundant a few months later. The company is in the a lot of trouble and on its last legs, but the one member of staff who kept their job was the least capable and, surprise surprise, the relative.2 -
Does anybody has an idea what to "code" when you have too much free time? I am done with school and waiting for my university acceptance. No Websites.
TL;DR
Project ideas?13 -
you know, i've got 5K+ in cash savings before i would even need to dip into "long term" retirement funds
and other than the food and the drink, i don't spend anything (can scrape by on 1-2K per month)
so fuck it, i'm going to enjoy the amazing weather that is everywhere in europe right now
if companies are going to be assholes, i am too
because in the long run i know my skills and vast competances are valuable
companies can hire the cheap clueless scrub in the shortrun
but i know i'll win in the long run when they realize who they are trying to hire does not exist
what's even one year of being unemployed against a lifetime of opportunity and projects? nothing.13 -
my life: dealing with shitty bullshit technologies to scrape together some money so i'll be able to retire for a few years before i die
🤡
I swear a year long+ (or permanant) sabbatical is on the horizon, I'm utterly sick of this shit6 -
Today I came across a very strange thing or a coincidence(maybe).
I was working on my predictive analytics project and I had registered on Kaggle(repository for datasets) long back and was searching on how to scrape websites, as I couldn't find any relevant dataset. So, while I was searching for ways to scrape a website, suddenly after visiting a few websites, I get notifications of a new email. And it was from Kaggle with the subject line
"How to Scrape a Tidy Dataset for Analysis"
Now I don't how to feel about it. Mixed feelings! It is either a wild coincidence, or Kaggle is tracking all the pages visited by the user. The latter makes more sense. By the way, Kaggle wasn't open in any of the tabs on my browser.1 -
back in college i started a project to manage my MTG and YuGiOh cards. I wanted to have a database for them with a graphical manager (already some older ones on github but i don't like the feel of them).
But between college work, the difficulty of building a SQL database schema for them and the fact I had hundreds of cards I'd need to put into the database manually I dropped the project after the 2 friends working with me also dropped out of the project.
But recently I found this hackster project (https://hackster.io/mportatoes/...), and i'm mostly sure I could retrofit it to use opencv to at least read the card title reliably allowing me to scrape the rest of the information from some wikia page as a new card is scanned. I'd just have to pick up a bin of legos at walmart lol
And previously learned about mongodb which would make storing the cards ina DB a lot easier than dealing with SQL.
I might pick this back up again, but when I first started I had 2 friends working on it with me who both dropped out before I finally gave up, so starting by myself might be a little demotivating. -
Oh my dear internet,
FUCK THIS FUCKING SHIT
I AM SICK AND TIRED OF IT, WHO BUILT THIS HACKED TOGETHER ORWELLIAN SWAMP PIT?
Fuck the same fucking Envato template on every content page with 70 layers of sidebars, inline ads, popups, cookies and content shifting as if I was playing CATCH UP WITH YOUR FUCKING CONTENT.
FUCK the same fucking annual upselling 'plans' on every 7-day trial overengineered scam app that requires me to sign up for 1 fucking, falsely advertised task where my fucking password generator doesn't even recognize the input as a password field so I have to cmd+, to my FUCKING BABYLONIAN PASSWORD ARCHIVES PROMPTING ME FOR THE MASTER PASSWORD.
Thank god I can at least CREATE A BURNER CREDIT CARD THAT FREEZES ITSELF BECAUSE I CANNOT BE BOTHERED TO UNSUBSCRIBE FROM YOUR FUCKING STEAMING CRAP.
FUCK every fucking step I take being recorded by our CYBERPUNK OVERLORDS REQUIRING ME to sign up for 5 different fucking privacy protection tools' annual plan or duct tape some open source shit onto my browser just for some BASIC PRIVACY WHILE TRYING TO NAVIGATE ALL THE OTHER 5000 annuals plan naval mines like A FUCKING FRENCH SUBMARINE IN 1940 GERMAN WATERS.
FUCK my walled garden scam ecosystem not being compatible with your walled garden scam ecosystem prompting me to reactivate my old SATANIC GOOGLE DON'T BE EVIL ACCOUNT from 2012 sending me on a DANTE ALIGHIERI STYLE ODYSSEY THROUGH THE 9 LAYERS OF PASSWORD RESET QUESTIONS, UNEXPECTED ERROR, 2FA MY PHONE DIED HELL to come out on the other side as a broken man.
Thank GOD I have your useless SUPPORT PAGE to aid with my signup problems that is actually just an FAQ with a hidden EASTER EGG HUNT for your support form CRISP AI BOT THAT IS ALSO 'currently experiencing high demand due to COVID' which is peculiar since that has been 3 years ago, but fortunately for you enabled you to fire ALL YOUR SUPPORT STAFF AND REPLACE IT WITH THIS BANNER.
I might as well just SCRAPE your fucking content, it'd be faster.
And although it is quite funny, FUCK THIS PAGE TOO for having me create another of 10.000 accounts to write this shit, where my browser firmly placed a newly created burner email into the PASSWORD FIELD.
I do not know how we managed to create something that is even more unwieldy than 56k DIAL-UPS, but I know that if this shit continues I'll have to train my own AGI to proudly interact with of all this STUPID SHIT on my behalf or I'll have to move into THE FUCKING MOUNTAINS AND LIVE WITH THE DEER.1 -
So my phone got stolen last night. FOR FUCK SAKE. I scrape together enough money to get another cheap phone after the last one broke, and now I need YET ANOTHER ONE WHEN I HAVEN'T GOT A JOB AND CAN'T APPLY FOR ONE WITHOUT ARGRFHUJGHIOSDJGBH:USKDGHISD:1
-
I've been creating a new website layout for a small theatre group. I am like how it's coming together :)
It is designed with user interaction in mind because the layout they are using now is... Well, thrown together. It uses images as buttons and I had a few friends of mine ask "Where do I get tickets" and it is not updated regularly.
I plan to automate a few things like for shows, if the show date has passed, it will remove itself from the home page. I also plan to web scrape, if the site permits, some information to be displayed on the show page.
I got to lots to do before I can show them what I've made.3 -
Service I was needed to integrate to our system had such poor documentation and a separate pricing tier to access their APIs...
... Not having it. Used Guzzle to perform both the authentication and their search page, then made wrote a function to web scrape the result.
Job done. 😎 And yes, I have no shame to say I love PHP.2 -
upside I guess, if your website's content is generated by JavaScript I'm too lazy to find an emulator to scrape it off you
granted I'm sure this website's business would've been much better had they made it scrapable since I'm quite literally trying to retrieve their capabilities listed in one of their help pages and give myself alerts when they gain new features
but no, I guess3 -
The customer wants to migrate his old store into WooCommerce. Here's a MySQL dump with 130 tables and no documentation on how they're related.
You also have to scrape all of the couple thousand product images off their site because they don't want their old dev knowing, so you can't just have FTP access...1 -
Wonderful experience today
I'm scraping data from an old system, saving that data as json and my next step is transforming the data and pushing it to an api (thank god the new system has an api)
Now I stumbled upon an issue, I found it a bit hard to retrieve a file with the scraper library I'm using, it was also quite difficult to set specific headers to download the file I was looking for instead of navigating to the index of the website. Then I tried a built-in language function to retrieve the files that I needed during the scrape, no luck 'cause I had to login to the website first.
I didn't want to use a different library since I worked so hard and got so far.
My quick solution: Perform a get request to the website, borrow the session ID cookie and then use the built-in function's http headers functionality to retrieve the file.
Luckily this is a throwaway script so being dirty for this once is OK, it works now :) -
Hey, know that joke where people say it runs yesterday but for some reason it doesn't run the next day? The same thing happens to me here with Hecker (a Hacker News 'client' written in Go that I am currently working on)... Oh wait a second it works again!
Btw, if you care about this, then the error seems to be a JSON error, which means that one of the submissions the program scrape has wrong JSON format, and its error is an invalid character error. Bruh.2 -
So I have question about my resume.
During my college time, I have done two projects related to politics:
One is to analyze the bias of media. What I did is scrape news covers for Trump and Hillary during election year and get sentiment analysis. The result is not surprising that among NY Times, NBC, Fox, Eashington Post, and CNN, Fox news is clearly favoring Trump, since Fox news is a republican news site.
The other project I did was to analyze the speech complexity and sentiment of the election. One of the observation we made was that Hillary and Trump are almost at the same level regarding speech complexity. However, Trump has a more positive sentiment in the speech, which is true consider how much he loves to say make America great again.
Now the question is, when I gave my advisor my resume, she said that I'd better not put those two projects on my resume since they are related to politics.
But, I am applying for a data science master degree. Seriously, I was just collecting the data and the data speaks for himself, why should I take those projects off my resume? I'm very proud of those projects I did as a matter of fact.
So here is the question. Shall I take off those two projects on my resume because they were political or I should leave it thereawarreally need some professional views. Please.1 -
Dissertation is about dark net markets, was proposed as a research project but I decided to make some scrapers to add in a programming aspect.
Unsurprisingly dark net markets are a bitch to scrape.
At least I'm learning stuff :p. -
How many keywords are appropriate to put in a "skills" section on a resume?
Technically I've played with a lot of tech and stacks, and done tiny one offs, tutorials and independent projects but nothing that wasnt more than a day on any one of them.
Basically im fast at picking up a language and api and just rolling with it and getting something done, even without tutorials or tons of googling. Though I find myself constantly relying on manuals and reading apis.
Is this normal or should entry level be familiar with the api of something from the get go?
I see a lot of people say to game the system just to get your foot in the front door past the automated keyword filters and on to an interviews where the real requirements are listed.
But I'd rather not list under the skill section something I only used for all of ten hours in one or two sittings.
Also is it acceptable to list a "learning", "would like to learn/know more of", or "planned skill additions" section?
Also what do I add for extras? "Achievements"? "Volunteer work"? "Hobby projects?", "past times?"
Is any of this seen as necessary or well rounded?
If it is really just about the numbers I'll just go scrape junior and entry level positions and take their keywords and automatically fill out template resumes to automate applying.
Could even use SQLite to store the results and track progress lol.
I've never worked as a professional programmer, but it's the only thing I ever enjoyed doing for 12 hours a day.16 -
Need suggestions for nutch vs storm crawler. I have to scrape information for 5000 companies listed on bse3
-
I thought my code was bad and that was why it was taking twice as long as any other group to run
No it’s just Illinois the state my group was assigned has almost 2000 more data rows to scrape compared to any other group. My code wasn’t running slow. It just had longer to run
I’ve spent 4 days trying to fucking refactor and improve my code Ignoring clean code and attempting clever code to run faster and now I need to revert back to clean code since no one else in my group would be able to understand or work on the damn file if I left it at clever
Fucking hell 😫1 -
me: thinking about scraping for a webapp
Random Guy: walks past me staring me dead in the eyes.
Me outloud: "how do i scrape" -
Im freelancing with a company that basically told me to scrape some data from some websites and do some aggregations on them. wish my full time job was as easy 🤣
-
Yet another WordPress rent... But..
I normally don't develop for WordPress so I didn't knew that every Wordpress Site has by default a RESTapi enabled.
Which makes it super easy to scrape content, remove Ads and so on..
Just add /wp-json/wp/v2/pages/ to any url that's running wordpress..
Works also with media uploads, comments... Everything...
https://jamieoliver.com/wp-json/wp/...4 -
Playing around with a POC I'm doing for work, and it works so well I got an IP ban from one of my favorite websites for a massive amount of requests they got from me
-
Can someone give me any ideas on sites that have a lot of textual data worth scraping in mass quantities? I'm trying to scratch a few itches.
My current ideas are scraping Amazon, Indeed, and Twitter. But I'd like to scrape more and maybe not so much FAANG related companies.2 -
Is there a practical way to predict the crowd density of a place in real-time?
I was thinking of some way to scrape social media activities and using the geolocation tags to predict the crowd in that particular area?
But I am looking for a more accurate alternative!
Please help!!
All ideas are welcome12 -
I had a discussion - no, it was more a lobotomy - with one of our "experts"
I was kinda confused, as he had several grafana tabs open and an query editor...
He explained to me that he debugs and optimizes his query based on the grafana data....
Elasticsearch cluster with several hundred, different indices, > 20 TB data
I explained to him the scrape interval of 5secs, that he cannot distinguish his query from other queries, that there is far too much of an interference... Let alone that a 5 sec scrape interval is a very loooong time.....
Nope. It makes perfect sense to him and he'll continue to work like this. -
Hello, I have a question for anyone familiar with multithreading!
I just started working with threading for the first time, I mostly write powershell scripts 😅, I found that certain conditions make using multithreading an absolute time saver. And of course in some tasks it's not such a big deal.
I am currently working on a project that runs multiple threads and each thread might invoke one of my functions that also threads the work.
I'm a total newbhat when it comes to this stuff, but if my main process is 4 threads, and I can spin up, up-to 4 more threads to run one of my functions, does the math equate to a possible total number of threads of 16 or is it possible to have the threading go ape-shit bananas and utterly thrash the cpu with rampant threads getting created?
I've looked online and based on some of the info that I've managed to come across on my own, the answers elude towards being safe because I'm creating pools for running the threads first and the pool is responsible for maintaining min/max threads, but I can't seem to find good info on running a pool+threads inside another thread.
Just to let you in on what the function does that requires threading in the first place, I need to basically query CloudTrail based on ARN's to find events, but I can only pass a single ARN to the find-ctevent cmdlet. So I'm essentially making 1500-ish really really small calls to AWS just to get back event data for the ARN.
Serially, this takes like almost 20 mins, on my laptop using stupid settings like 24 threads, it completes in about 95seconds. On the actual server that will be running this code, I'm going to limit it to 4 threads and try to figure out a way to cache the info locally and update the info on a cron or schedule so only the initial scrape takes forever and then the updates can be done nightly or something.
thank you in advance for your help, I'm not too sure if the question is dumb but please let me know either way!8 -
I need suggestions
I’m thinking about making a blog called but how do I, this will include tutorials that covers things not taught in school, but you wished you knew how to do.
So right now I have ideas like:
How to write zsh plugins
How to scrape the web(scrape html or sending request)
How to write chrome plugins
How to center a div in different ways
How to write backend codes in js
How to setup an interactive website on a server with domain
But I need more, I need suggestions.8 -
Twitter disclosed a bug on its platform that impacted users who accessed their platform using Firefox browsers.
According to the report of ZDNet: Twitter stored private files inside the Firefox browser's cache (a folder where websites store information and files temporarily). Twitter said that once users left their platform or logged off, the files would remain in the browser cache, allowing anyone to retrieve it. The company is now warning users who share systems or used a public computer that some of their private files may still be present in the Firefox cache. Malware could be used to scrape and steal this data.2 -
Why do clients expect that they would get a high quality machine learning model without a properly cleaned dataset? I usually get the response, ‘just scrape data and train it. It shouldn’t take long’3
-
A c# remote procedure library.
Made years ago when i had no real science and engineering knowledge.
https://github.com/scrapes/... -
I don't even really know where to start, so I figure I'll just throw this out there and see where it goes.
My daughter is disabled. She's in sports and dance, but it's taken my wife and I years to find out about the organizations she's now in, and that's mostly through word of mouth. Other families have told us because they've had the years of experience that we didn't. And now we're passing the information on to other less experienced families. And that's a problem that everyone we've talked to agrees upon: there's really no good way of discovering what organizations are out there, and what they can help with.
There exist some sites out there like https://challengedathletes.org/reso... which are really just lists of sites, but really nothing more to indicate that this group has wheelchair basketball, that group has adaptive ballet, that kind of thing. So I'm thinking, what if I built a site that provided an index. Searchable, faceted, like Algolia or AWS Cloudsearch. That part I can do. But how would I go about gathering the information? Could I somehow scrape it? If so, how do I organize it? Do I crowdsource by petitioning /r/disability, the Facebook support groups my family belongs to, and other places across the interwebs?
I can design the data model. I can build the webapp. I can make it fast and pretty and easy to use. But how do I get the data?2 -
So it turns out the site my app scrapes for those NSFW pictures actually scrapes another site.
Now it just seems to mirror that site like a proxy though doesn't work well... pictures not loading, links not working.
But then at the bottom there's like a Copyright tag which shows the other site's name.
I wonder if perhaps he got tired of playing cat and mouse with me and just said ah screw it... I give up, here's the source, go scrape them.13 -
Because the company scrape the whole project, I been assigned to do backend laravel. After struggling for months , I been reassigned again help another front end project written in reactJS this time. I don't know who am I anymore and I did not touch android development for months 😢