1. Your data is severely imbalanced, so accuracy is a very misleading metric to use here. From what I see, you have a 1:20 imbalance (malicious vs non-malicious distribution). This affects both the metrics and induces bias in classification.
2. I'd like to add to the other comment asking you for calibration curves and see what your minority class performance looks like in terms of precision, recall, f-beta, average precision (area under precision-recall curve).
3. Then, try and see if resampling helps or hurts the predictive performance- it typically speaks to the level of noise and small disjuncts in the data.
4. I see you've done a 0.2 split for test-train, but try and eliminate split bias by using stratified cross validation. This would ensure that you didn't just get lucky with random seed = 42 and get a really great test set.
All of these can be implemented using sklearn and imbalanced-learn [0]. Not included- deeper dive into cost sensitive and adversarial techniques. Let me know if you have any more questions and keep up the good work!
Thank you so much for these suggestions, I'll surely try these and will let you know.
One thing to add, the data is not that much imbalanced. I only used 100000 non malicious and 50,000 malicious so its 2:1 actually. I didn't use all the non malicious queries.
A machine learning driven web application firewall
http://fsecurify.com/fwaf-machine-learning-driven-web-applic...