I've been working on Elodie for over a year. It's organized over 15,000 of my personal photos and videos for me. It's also helped me craft a hands-free backup system.
I've written pretty extensively about it so I'll just link to those posts.
Neat! I was meaning to write a FUSE filesystem for my personal use that exactly accomplished this with EXIF data. I don't feel as motivated to do that now, will give elodie a spin instead.
I think this GitHub issue [0] refers to what you're saying. I've sort of accomplished it with scheduled tasks but something a bit more native is a great idea.
I wrote about it a bit here [1] under "the file system as a real time reflection of its contents"
I've been working on my Electron-powered desktop launcher (Alfred alternative) for the past year or so on and off. The idea is to eventually enable compatibility with existing Alfred workflows and also provide a rich DX for creating themes/plugins via the CLI tool that comes along with it.
I have been doing a lot of customer development lately and I needed a tool that made it easy to reach out to large-ish groups of people whilst testing the efficacy of different messages.
I chose to build my own product because I could not find another tool that was simple enough to use (all the others were full-blown CRMs), I did not want to pay for the subscription to those other tools and (most importantly) I wanted to use this project to learn a couple new things (Javascript ES6, using Google APIs, building task queues).
It's pretty rough and I built it just for myself (I don't allow signups because I took a couple security shortcuts in the design) but it works and has increased my productivity dramatically.
I had been wanting a web app where I could quickly open GPX, KML, and KMZ files to see them on a map. I use it to look at hikes, and some KML files from some experimental mapping tools I've worked with in the past.
This is the result: http://87.is/mapspray/ . There are sample files on this page that can be shown.
The main unique feature of this project is the handling of KML file styles above what is preserved by i.e. converting them into geojson. Since it serves my needs I've ran out of steam on it and it will probably remain incomplete for the forseeable future.
I've been working on Otter, which is an Operational Transformation Engine(think google docs real-time collaboration). https://github.com/TheAustinSeven/otter
The hope is that this will make it a little easier to build collaborative apps. More recently, though, I have been spending some time designing a new programming language(I know, I know, we already have so many), so I haven't spent as much time on otter.
I do weekend projects fulltime now, well as much as coding whenever i feel like it is considered full time. I recently finished https://ForwardMX.io and now lack of motivation for anything else so i've started working on updating my TUI library for ruby: https://github.com/b1nary/rutui
in your blog post about launch:
This is essentially how this all works. It is not exactly magic, but not easy [ether]
also [failsave] a few words ahead.
Nevetherless it was an interesting read.
You mention a "few weeks" of development and testing. How many few weeks?
Thanks for the corrections, corrected. Obviously not my mother language :)
Well i've run the email server that is now the second server for nearly half a year i think. Developing the site, creating tests and binding it to the email server maybe took 2-3 weeks. It is hard to tell because i had the landing page and backend developed shortly after the first test server was setup and then finished it much later within a little more than a week (i think, i tend to underestimate these things :).
To be fair: I've used my own Rails template which saved some time and i've reused a lot of code for stripe from a different project as well.
1. Your data is severely imbalanced, so accuracy is a very misleading metric to use here. From what I see, you have a 1:20 imbalance (malicious vs non-malicious distribution). This affects both the metrics and induces bias in classification.
2. I'd like to add to the other comment asking you for calibration curves and see what your minority class performance looks like in terms of precision, recall, f-beta, average precision (area under precision-recall curve).
3. Then, try and see if resampling helps or hurts the predictive performance- it typically speaks to the level of noise and small disjuncts in the data.
4. I see you've done a 0.2 split for test-train, but try and eliminate split bias by using stratified cross validation. This would ensure that you didn't just get lucky with random seed = 42 and get a really great test set.
All of these can be implemented using sklearn and imbalanced-learn [0]. Not included- deeper dive into cost sensitive and adversarial techniques. Let me know if you have any more questions and keep up the good work!
Thank you so much for these suggestions, I'll surely try these and will let you know.
One thing to add, the data is not that much imbalanced. I only used 100000 non malicious and 50,000 malicious so its 2:1 actually. I didn't use all the non malicious queries.
Elixir library, named Ashliah, to parse the IEX-TP protocol[0]. IEX-TP is the protocol that the Investor's Exchange uses. I plan to utilize that to create a service to sell real time stock data.
My other is an Amazon affiliate site that has lists of 4+ star amazon programming books[1]. I'm constantly adding more books and languages to it.
Finally, I'm working on my Elixir library that generates Ecto models from an existing database table called Plsm[2]. It currently supports MySQL and Postgres
Right now it has an index of ~70k conference talks / lectures / speeches. I'm working on improving it to get slide text and audio quality (for ranking), and getting more historical content.
I started out scraping sites manually, and started automating more pieces (a lot of sites use wordpress, so they are pretty structured). I'm working on a talk on the subject, so I'll have an article soon that explains better :)
Snakepit is a docker-enabled framework for analysis and triage of malware samples in a networked and containerized environment. It's designed for the easy addition of whatever tools you want to use. Written in Python, rarely worked on when fully sober.
This weekend I'm hoping to compile and install a simple Gear VR project w/ Unity. I got all the tooling installed yesterday but haven't had a chance to open it yet. The goal is to get a simple "hello world" level application installed on my phone.
An admin interface (among other improvements) to Platypus (https://github.com/GGServers/platypus). It's a (admittedly simple) server monitoring application written in Python.
https://github.com/jmathai/elodie
I've been working on Elodie for over a year. It's organized over 15,000 of my personal photos and videos for me. It's also helped me craft a hands-free backup system.
I've written pretty extensively about it so I'll just link to those posts.
[0] (motivation) https://medium.com/vantage/understanding-my-need-for-an-auto...
[1] (solution) https://medium.com/@jmathai/introducing-elodie-your-personal...
[2] (adaptation for google photos) https://medium.com/swlh/my-automated-photo-workflow-using-go...
[3] (one year reflection) https://artplusmarketing.com/one-year-of-using-an-automated-...
[4] (protecting against bit rot) https://medium.com/vantage/how-to-protect-your-photos-from-b...