More

sachou · 2025-06-29T10:30:56 1751193056

chatgpt will crop your image if asked. maybe they will fix in the future

sachou · 2025-06-27T08:49:13 1751014153

glad to hear you like it!

sachou · 2025-06-27T06:09:00 1751004540

enjoy the free AI credits..

sachou · 2025-04-26T04:33:13 1745641993

Yesterday was fun and crazy.

I have been working on my design tool for now more than 1 month. And yesterday when I saw that finally openAI released their latest gpt4o model for imagen called "image-1". I decided to quickly add it and launch on Product Hunt!

and voila! currently TOP #8 of the day and got first paying user.

Please give it a try (free credits available) and comment your feedbacks, it helps me so much to get it right!

Full focus on improving the product!

sachou · on Dec 13, 2024

noted thank you for the nice reminder. Good I ll add more free tool and open playground

sachou · on Dec 13, 2024

yes exactly,

The main issue in scraping:

- If you scrape a lot, you will be block based on you IP; You need to use PROXY - Scraping entire website need specific logic, retries and more - It becomes an heavy background job

All the above takes time, so if in your business it is not your core feature, likely better to outsource it.

Good job doing it tho!

emmanueloga_ · on Dec 13, 2024

Some ideas:

Highlight the advantages of your service over DIY solutions prominently on your marketing site. The site looks great! but I think it could better focus on convincing developers to adopt your product vs just listing features.

Consider reaching out to clients to quantify the time saved using your service. Emphasize how it eliminates the hassle of setting up custom background job processes, proxies, and other complexities that can snowball into a full-fledged project.

Good luck on your journey!

sachou · on Dec 13, 2024

I guess so if you goal is to have people knows about your content, might have a small SEO bump

sachou · on Dec 13, 2024

do you need to embed it directly in pinecone ?

If yes then DataFuel is the right choice. Adding this feature as we speak.

Please let me know :)

olup · on Dec 13, 2024

Interesting but we process documents before embedding them, and have specific requirements for the embedder.

Having developed a couple of page to markdown myself, I think the bigger challenge is to make sense of so many pages that rely on spacial organisation of information that only makes sense to human, or even presence of images. One way to do it is to render the page as an image and extract data with a vision llm. But you do need heuristic on when to do classic extraction and when to use vision, plus get rid of cookie banner and overlays. This is more complex and costly, but have real business value, for the one that can pull it off.

sachou · on Dec 13, 2024

what would be your specific requirement?

Right now adding chunk size, model for embedding, what else?

Image is a great challenge with OCR can be solve as you mentioned

olup · on Dec 13, 2024

We, as many players, have custom pipelines on embedding. We don't split docs based on chunk size but do semantic chunking and chunk augmentation. We embed everything with two embeddings services to always have a fallback if one provider is not available.

If I were in your shoes I would not think embedding and inserting in a vector store would be my responsibility, especially since there are so many different stores on the market.

sachou · on Dec 13, 2024

yes exactly working on building a competitive advantage, but the AI space is so big and only getting bigger.

Any feedback that could help datafuel becomes more unique?

sachou · on Dec 13, 2024

Good point! Thanks for the feedback.

Let me add that to my todos.

keyle · on Dec 13, 2024

It boggles my mind that you would launch without that as a prime directive.

tesrx · on Dec 13, 2024

OP just graciously accepted that feedback, no need to be condescending :)

pxtail · on Dec 13, 2024

Let me translate what OP wrote:

> Good point! Thanks for the feedback.

> Nothing like this will be added to the product. Money comes from scraping content and thus content will be scrapped regardless any non-scrapping hints and we will be actively working on countering anti-scraping measures.

vunderba · on Dec 13, 2024

well... while true how do you reconcile this with OP's statements on another thread?

> It has an extensive proxy IP and retry system in place to bypass bot detection.

Seems like a bit of "talking out of both sides of your mouth".

__loam · on Dec 13, 2024

It's kind of tone deaf to launch a tool like this without considering this in the current climate. Not a popular take on hackernews but everyone outside the tech space is pretty pissed about this stuff.

Onavo · on Dec 13, 2024

And proxy farms exist solely to get around this problem. If you believe the rights of content creators is the end all be all, don't complain next time Disney tries to extend the IP expiration dates.

__loam · on Dec 13, 2024

Using the behavior of one bad actor to excuse the abuse of everyone else is pretty bad.

bryanrasmussen · on Dec 13, 2024

I was recently on a project and out of the 10+ devs on it I was the only one who really knew about robots.txt, or at least the only one who said hey that robots.txt needs to handle internationalized routes, the default ones we disallow are all in English.

I don't say that makes them bad, they just knew other things, so I can totally not have my mind boggled that someone launched a product like this and didn't take obeying robots.txt into consideration and then adds it to the todos when someone complains.

freeone3000 · on Dec 13, 2024

Programmers have no institutional memory.

bryanrasmussen · on Dec 13, 2024

agreed, I had to often explain lower level simple stuff was there for them, because they didn't happen to know about that thing and were surprised.