Hacker Newsnew | past | comments | ask | show | jobs | submit | sachou's commentslogin

chatgpt will crop your image if asked. maybe they will fix in the future


glad to hear you like it!


enjoy the free AI credits..


Yesterday was fun and crazy.

I have been working on my design tool for now more than 1 month. And yesterday when I saw that finally openAI released their latest gpt4o model for imagen called "image-1". I decided to quickly add it and launch on Product Hunt!

and voila! currently TOP #8 of the day and got first paying user.

Please give it a try (free credits available) and comment your feedbacks, it helps me so much to get it right!

Full focus on improving the product!


noted thank you for the nice reminder. Good I ll add more free tool and open playground


yes exactly,

The main issue in scraping:

- If you scrape a lot, you will be block based on you IP; You need to use PROXY - Scraping entire website need specific logic, retries and more - It becomes an heavy background job

All the above takes time, so if in your business it is not your core feature, likely better to outsource it.

Good job doing it tho!


Some ideas:

Highlight the advantages of your service over DIY solutions prominently on your marketing site. The site looks great! but I think it could better focus on convincing developers to adopt your product vs just listing features.

Consider reaching out to clients to quantify the time saved using your service. Emphasize how it eliminates the hassle of setting up custom background job processes, proxies, and other complexities that can snowball into a full-fledged project.

Good luck on your journey!


I guess so if you goal is to have people knows about your content, might have a small SEO bump


do you need to embed it directly in pinecone ?

If yes then DataFuel is the right choice. Adding this feature as we speak.

Please let me know :)


Interesting but we process documents before embedding them, and have specific requirements for the embedder.

Having developed a couple of page to markdown myself, I think the bigger challenge is to make sense of so many pages that rely on spacial organisation of information that only makes sense to human, or even presence of images. One way to do it is to render the page as an image and extract data with a vision llm. But you do need heuristic on when to do classic extraction and when to use vision, plus get rid of cookie banner and overlays. This is more complex and costly, but have real business value, for the one that can pull it off.


what would be your specific requirement?

Right now adding chunk size, model for embedding, what else?

Image is a great challenge with OCR can be solve as you mentioned


We, as many players, have custom pipelines on embedding. We don't split docs based on chunk size but do semantic chunking and chunk augmentation. We embed everything with two embeddings services to always have a fallback if one provider is not available.

If I were in your shoes I would not think embedding and inserting in a vector store would be my responsibility, especially since there are so many different stores on the market.


yes exactly working on building a competitive advantage, but the AI space is so big and only getting bigger.

Any feedback that could help datafuel becomes more unique?


Good point! Thanks for the feedback.

Let me add that to my todos.


It boggles my mind that you would launch without that as a prime directive.


OP just graciously accepted that feedback, no need to be condescending :)


Let me translate what OP wrote:

> Good point! Thanks for the feedback.

> Nothing like this will be added to the product. Money comes from scraping content and thus content will be scrapped regardless any non-scrapping hints and we will be actively working on countering anti-scraping measures.


well... while true how do you reconcile this with OP's statements on another thread?

> It has an extensive proxy IP and retry system in place to bypass bot detection.

Seems like a bit of "talking out of both sides of your mouth".


It's kind of tone deaf to launch a tool like this without considering this in the current climate. Not a popular take on hackernews but everyone outside the tech space is pretty pissed about this stuff.


And proxy farms exist solely to get around this problem. If you believe the rights of content creators is the end all be all, don't complain next time Disney tries to extend the IP expiration dates.


Using the behavior of one bad actor to excuse the abuse of everyone else is pretty bad.


I was recently on a project and out of the 10+ devs on it I was the only one who really knew about robots.txt, or at least the only one who said hey that robots.txt needs to handle internationalized routes, the default ones we disallow are all in English.

I don't say that makes them bad, they just knew other things, so I can totally not have my mind boggled that someone launched a product like this and didn't take obeying robots.txt into consideration and then adds it to the todos when someone complains.


Programmers have no institutional memory.


agreed, I had to often explain lower level simple stuff was there for them, because they didn't happen to know about that thing and were surprised.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: