Re: the DOJ emails prefixed with "EFTA", I have no idea how over-redacted they are. They definitely seem dubious though.
Re: the DDoSecrets emails though (YAHOO dataset), I have more to share.
Drop Site News agreed to give us access to the Yahoo dataset discovered by DDoSecrets, but on the condition that we help redact it. It's a completely unfiltered dataset. It's literally just .eml files for [email protected]. It includes many attached documents. There is no illegal imagery, but it has photos of Epstein's extended family (nephews, nieces, etc) and headshots of many models that Epstein's executive assistant would send to him. I was quite shocked that this thing existed.
We built some internal redaction tools that the Drop Site team is now using to comb through all of this. We've released 5 batches of the Yahoo mail now, with the 1k+ Amazon receipts being the most recent.
Unlike the DOJ, we've tried to minimize the ambiguity about what was redacted.
For example: all redacted images are replaced with a Gemini-generated description of that photograph.
Another example: we are aggressively redacting email addresses and phone numbers of normal people to avoid spamming them. Perhaps others would leave it all in, but Riley and I don't want to be responsible for these people's lives getting disrupted by this entire saga. For example, we redacted this guy's email but not his name: https://www.jmail.world/thread/4accfb5f3ed84656e9762740081a4...
Riley and I were not expecting this type of scope when we first dropped Jmail. Jmail is an interesting side project for us, and this new dataset requires full-time attention. Thankfully we have help though. We're happy to take on this responsibility given how helpful, thoughtful and careful both the Drop Site and DDoSecrets team has been here.
Luckily umami in docker is pretty compartimentalized. All data is in the and the DB runs in another container. The biggest thing is the DB credentials. The default config requires no volume mounts so no worries there. It runs unprivileged with no extra capabilities. IIRC don't think the container even has bash, a few of the exploits that tried to run weren't able to due to lack of bash in the scripts they ran.
Deleting and remaking the container will blow away all state associated with it. So there isn't a whole lot to worry about after you do that.
Why not just release the software after your set threshold of time versus opening it up with such a license? To get eyes on it before-hand?
Also how does this work with contributor contributions? Does the owning SaaS get the benefit of contributor work instantly while everyone else has to wait 2 years? What about the contributers themselves?
presumably because a) it still allows the source code to be available and used for the 'permitted purposes' (i.e. anything that's not directly competing), and b) it represents a concrete commitment to open up, not just a pinkie promise (even if they were to have a license or contract which promised it, it would not be as easy to rely on as actually having the source code published. Companies have reneged on such promises before).
And yeah, by my reading essentially people can contribute code or publish patches (with just a plain MIT license in principle), just the original and derivatives still can't be used for non-permitted purposes until the timer is up.
You may want to allow certain uses (self-hosting, etc) even before it transitions to a fully open-source license. Having access to the source code can also help SaaS users debug certain situations.
This does not work, I've tried this before. Google verification for example would not accept my Twilio number as verification (about 2 years ago). You can lookup a phone number for the provider and numbers from Twilio or others tend to not be accepted.
I just moved my scrobbling to a self-hosted instance of Koito after switching from Spotify to Jellyfin. Very happy with the change, as I can still share all my music data with friends
How are we going to keep using computers. As individual users. Is it over? You won't be able to buy parts first-party anymore if things keep going this way.
> LLMs are especially good at evaluating documents to assess the degree that an LLM assisted their creation!)
That's a bold claim. Do they have data to back this up? I'd only have confidence to say this after testing this against multiple LLM outputs, but does this really work for, e.g. the em dash leaderboard of HN or people who tell an LLM to not do these 10 LLM-y writing cliches? I would need to see their reasoning on why they think this to believe.
I am really surprised that people are surprised by this, and honestly the reference was so casual in the RFD because it's probably the way that I use LLMs the most (so very much coming from my own personal experience). I will add a footnote to the RFD to explain this, but just for everyone's benefit here: at Oxide, we have a very writing-intensive hiring process.[0] Unsurprisingly, over the last six months, we have seen an explosion of LLM-authored materials (especially for our technical positions). We have told applicants to be careful about doing this[1], but they do it anyway. We have also seen this coupled with outright fraud (though less frequently). Speaking personally, I spend a lot of time reviewing candidate materials, and my ear has become very sensitive to LLM-generated materials. So while I generally only engage an LLM to aid in detection when I already have a suspicion, they have proven adept. (I also elaborated on this a little in our podcast episode with Ben Shindel on using LLMs to explore the fraud of Aidan Toner-Rodgers.[2])
I wasn't trying to assert that LLMs can find all LLM-generated content (which feels tautologically impossible?), just that they are useful for the kind of LLM-generated content that we seek to detect.
I still don't quite get this reasoning. A statistical model for detecting a category (like is this written hiring material LLM generated or not, is this email spam or not, etc) is most metricized by its false positive and false negative rate. But it doesn't sound like anyone measures this, it just gets applied after a couple times of "huh, that worked" and we move on. There's a big difference between a model that performs successfully 70% of the time vs one that performs 99% but I'm not sure we can say which this is?
Maybe if LLMs were aligned for this specific task it'd make more sense? But they're not. Their alignment tunes them to provide statistically helpful responses for a wide variety of things. They prefer positive responses to negative ones and are not tuned directly as a detection tool for arbitrary categorization. And maybe they do work well, but maybe it's only a specific version of a specific model against other specific models hiring material outputs? There's too many confounding things here to not have to study this in a rigorous way to come to the conclusion that felt... not carefully considered.
Maybe you have considered this more than I know. It sounds like you work a lot with this data. But the off-handedness set off my skepticism.
I debated not writing this, as I planned on re-applying again, as oxide is in many ways a dream company for me, and didn't want this to hurt my chances if I could be identified and it was seen as negative or critical (I hope not, I'm just relaying my experience, as honestly as I can!), but I felt like I needed to make this post (my first on HN, a longtime lurkerj).
I applied in the last 6 months, and against my better judgement, encouraged by the perceived company culture, the various luminaries on the team, the varied technical and non-technical content on the podcasts, and my general (unfortunate) propensity for honesty, I was more vulnerable than normal in a tech application, and spent many hours writing it. (fwiw, it's not super relevant to what I'll get to, but you can and should assume I am a longtime Rust programmer (since 1.0) with successful open source libraries, even ones used by oxide, but also a very private person, no socials, no blogging, etc., so much to my chagrin, I assumed I would be a shoe-in :))
After almost 3 months, I was disappointed (and surprised if I'm being honest, hubris, indeed!) to receive a very bland, uninformative rejection email for the position, stating they received too many applications for the position (still not filled as of today!) and would not proceed at this time, and welcome to re-apply, etc.
Let me state: this is fine, this is not my first rodeo! I have a well paying (taking the job would have been a significant paycut, but that's how much I wanted to work there!), albeit at the moment, unchallenging job at a large tech company. What I found particularly objectionable was that my writing samples (urls to my personal samples) were never accessed.
This is or could be signal for a number of things, but what was particularly disappointing was the heavy emphasis on writing in the application packet and the company culture, as e.g., reiterated by the founder I'm replying to, and yet my writing samples were never even read? I have been in tech for many years, seen all the bullshit in recruiting, hiring, performed interviews many times myself, so it wouldn't be altogether surprising that a first line recruiter throws a resume into a reject pile for <insert reasons>, but then I have so many other questions - why the 3 months delay if tossed quickly, and if it truly was read by the/a founder or heavily scrutinized, as somewhat indicated by the post, why did they not access my writing samples? There are just more questions now.
All of this was bothersome, and if I'm being honest, made me question joining the company, but what really made me write this response, is that I am now worried, given the content of the post I'm replying to, whether my application was flagged as LLM generated? I don't think my writing style is particularly LLMish, but in case that's in doubt, believe me or not, my application, and this response does not have a single word from an LLM. This is all, sui generis, me, myself, and I. (This doesn't quite explain why my samples weren't accessed, but if I'm being charitable, perhaps the content of the application packet seemed of dubious provenance?)
Irregardless, if it was flagged, I suppose the long and short of this little story is: are you sending applicants rejection letters noting this suspicion, at least as a courtesy? If I was the victim of a false positive, I would at least like to know.
This isn't some last ditch attempt (the rejection was many months ago) to get re-eval'd; I have a job, I can reapply in my own time, and even if this was an oversight or mistake (although not accessing the writing samples at all is somewhat of a red flag for me), there is no way they can contact me through this burner account, it's just, like, the principle of it, and the words needed to be said :)
Thank you, and PS, even through it all, I (perhaps now guiltily) still love your podcast :D
I had a very similar experience, except I got the automated email after two months, not three — you sound like a stronger candidate, so maybe that's why I got rejected sooner, which'd be fair enough. Still, spending about a week's worth of evenings between the suggested materials, reflecting, writing, and editing 15 pages for one job application and having zero human interaction feels uniquely degrading.
I disagree with your point about that being fine. I think it's not good enough to replicate the bare minimum of what the rest of the industry does while asking for so much more from candidates.
A standard custom, well researched cover letter takes an order of magnitude less effort. When it's cookie cutter rejected by someone spending a few seconds on the CV, it's at least understandable: the effort they'd spend writing a rejection (or replying back) is higher than the amount of effort they spent evaluating the application.
With Oxide however, Brian made a point that they "definitely read everyone's materials" [1]. Which means reading at the very least five pages per candidate. If that's still the case, having an actual human on the other side of the rejection would add a very small amount of time to the whole process, but the company decided to do the absolute least possible. It's a choice, and I think this choice goes against their own principle of decency:
"We treat others with dignity, be they colleague, customer, community or competitor."
I wish Oxide best of luck. They have lots of very smart, very driven people that I'd love to work with, and I love what they are doing. Hope this feedback helps them get better.
I understand your disappointment; we are very explicit about why we provide so little feedback.[0] I disagree that it's indecent; to the contrary, we allow anyone to shoot their shot, with the guarantee that they will be thoughtfully considered.
Indeed, I understand your reasoning, you talk about that in the podcast in the RFD. This is why I wasn't talking about the lack of feedback, but the lack of human interaction. While there is nothing constructive to be done about the disappointment of rejection, this part is very much in your power to change, and that's why I think it's constructive feedback and not just venting.
That said, the RFD does say this:
> Candidates may well respond to a rejection by asking for more specific feedback; to the degree that feedback can be constructive, it should be provided.
Even just replying with refusal to provide feedback would still be more humane and decent.
Your materials were absolutely read (and indeed, RFD 576 makes clear that LLMs are not a substitute for reading materials). If you have writing samples that were external links, I can't guarantee that they were clicked through though: in part because the materials themselves constitute a galactic writing sample, we may have not clicked through because we were already at a decision point before reading your external writing. As for more specific feedback, if you can DM me, I'll see if I can give you more specific feedback -- but as we explicitly indicate in RFD 3[0], we are very limited in what we can provide.
As for your application getting flagged as LLM-generated: we in fact don't flag such applications (we just reject them), and it's very unlikely that we felt that yours were LLM-generated. (We really, really give applicants the benefit of the doubt on that.)
All of that said: absolutely no one is a shoe-in at Oxide. If you genuinely thought that (and if your materials reflected that kind of overconfidence), it may have well guided our decision. We are very selective in terms of hiring -- and we are very oversubscribed. Bluntly: it's very hard to get a job at Oxide. I know this seems harsh and/or unjust or unfair, but this is the reality. As we told you in the letter we sent you, we already have people at Oxide who prevailed on subsequent applications, because they found a job that's a better fit for them, or they have vastly improved materials (or both). Finally, you can also take solace in knowing that your post here in no way hurts your future chances at Oxide, and we look forward to reading your materials should you choose to apply in the future.
I thought about it - a quick way to verify whether something was created with LLM is to feed an LLM half of the text and then let it complete token by token. Every completion, check not just for the next token but the next n-probable tokens. If one of them is the one you have in the text, pick it and continue. This way, I think, you can identify how much the model is "correct" by predicting the text it hasn't yet seen.
I didn't test it and I'm far from an expert, maybe someone can challenge it?
That seems somewhat similar to perplexity based detection, although you can just get the probabilities of each token instead of picking n-best, and you don't have to generate.
It kinda works, but is not very reliable and is quite sensitive to which model the text was generated with.
I expect that, for values of n for which this test consistently reports "LLM-generated" on LLM-generated inputs, it will also consistently report "LLM-generated" on human-generated inputs. But I haven't done the test either so I could be wrong.
I would be surprised they have any data about this. There are so many ways LLMs can be involved, from writing everything, to making text more concise or just "simple proofreading". Detecting all this with certainty is not trivial and probably not possible with the current tools we have.
I wish the author included examples of what it's like to write ReactJS in Rust. Can't get a good idea of how succinct or structured it is to write for example a button handler on an element from what's in the docs currently.
After looking through the author's post history, the title seems incorrect. This does not seem to be "ReactJS in Rust", but more something Express-like in Rust. They've been spamming this project over the past 50 days or so with different conflicting names, seemingly to try to gain traction[0].
Github Actions left a bad taste in my mouth after having it randomly removed authenticated workers from the pool, after their offline for ~5 days.
This was after setting up a relatively complex PR workflow (always on cheap server starts up very expensive build server with specific hardware) only to have it break randomly after a PR didn't come in for a few days. And no indication that this happens, and no workaround from GitHub.
There are better solutions for CI, GitHub 's is half baked.
Roll 2d6, sum result. Your CI migration target is:
2. migrate secret manager. Roll again
3. cloud build
4. gocd
5. jenkins
6. gitlab
7. github actions
8. bamboo
9. codepipeline
10. buildbot
11. team foundation server
12. migrate version control. Roll again
Not in love with its insistence on recreating the container from scratch every step of the pipeline, among a bundle of other irksome quirks. There are certainly worse choices, though.
Opposite of Jenkins where you have shared workspaces and have to manually ensure workspace is clean or suffer from reproducibility issues with tainted workspaces.
I'm aware, but thank you. Unfortunately, given sufficiently large artifacts, the overhead of packaging, uploading, downloading and unpacking them at every step becomes prohibitive.
Hudson/Jenkins is just not architected for large, multi-project deployments, isolated environments and specialized nodes. It can work if you do not need these features, but otherwise it's fight against the environment.
You need a beefy master and it is your single point of failure. Untimely triggers of heavy jobs overwhelm controller? All projects are down. Jobs need to be carefully crafted to be resumable at all.
Heavy reliance on master means that even sending out webhooks on stage status changes is extremely error prone.
When your jobs require certain tools to be available you are expected to package those as part of agent deployment as Jenkins relies on host tools. In reality you end up rolling your own tool management system that every job has to call in some canonical manner.
There is no built in way to isolate environments. You can harden the system a bit with various ACLs, but in the end if you either have to trust projects or build up and maintain infrastructures for different projects isolated at host level.
In cases when time-wise significant processing happens externally, you have to block an executor.
Yeah I was thinking of using it for us actually. Connects to everything, lots of plugins, etc. I wonder what the hate is from, they are all pretty bad aren't they ?
Will test forgejo's CI first as we'll use the repo anyway, but if it ain't for me, it's going to be jenkins I assume.
- DSL is harder to get into.
- Hard to reproduce a setup unless builds are in DSL and Jenkins itself is in a fixed version container with everything stored in easily transferable bind volumes; config export/import isn't straightforward.
- Builds tend to break in a really weird way when something (even external things like Gitea) updates.
- I've had my setup broken once after updating Jenkins and not being able to update the plugins to match the newer Jenkins version.
- Reliance on system packages instead of containerized build environment out of the box.
- Heavier on resources than some of the alternatives.
Pros:
- GUI is getting prettier lately for some reason.
- Great extendability via plugins.
- A known tool for many.
- Can mostly be configured via GUI, including build jobs, which helps to get around things at first (but leads into the reproducibility trap later on).
Wouldn't say there is a lot of hate, but there are some pain points compared to managed Gitlab. Using managed Gitlab/Github is simply the easiest option.
Setting up your own Gitlab instance + Runners with rootless containers is not without quirks, too.
CASC plugin + seed jobs keep all your jobs/configurations in files and update them as needed, and k8s + Helm charts can keep the rest of config (plugins, script approvals, nodes, ...) in a manageable file-based state as well.
We have our main node in a state that we can move it anywhere in a couple of minutes with almost no downtime.
I'll add another point to "Pros": Jenkins is FOSS and it costs $0 per developer per month.
I have a previous experience with it. I agree with most points. Jobs can be downloaded as xml config and thus kept/versioned. But the rest is valid. I just don't want to manage gitlab, we already have it at corp level, just can't use it right now in preprod/prod and I need something which will be either throwaway or kept just for very specific tasks that shouldn't move much in the long run.
For a throwaway, I don't think Jenkins will be much of a problem. Or any other tool for that matter. My only suggestion would be to still put some extra effort into building your own Jenkins container on top of the official one [0]. Add all the packages and plugins you might need to your image, so you can easily move and modify the installation, as well as simply see what all the dependencies are. Did a throwaway, non-containerized Jenkins installation once which ended up not being a throwaway. Couldn't move it into containers (or anywhere for that matter) without really digging in.
Haven't spent a lot of time with it myself, but if Jenkins isn't of much appeal, Drone [1] seems to be another popular (and lightweight) alternative.
Do you have a page about each dataset you're sourcing and the background on them like your provide here?
The "EFTA00000468" saga has me distrusting the authenticity of most of these datasets.
reply