Initial Steps
The process begins with identifying and removing all images not directly related to the main article subject. This includes promotional or recommended content images, and those at the end of articles that don’t show the main subject. Only images that show people, places, or things explicitly mentioned in the article are kept.
Content Type Identification
The next step is to determine the content type or genre, such as a news article, blog post, academic paper, or technical guide. This involves identifying the publication style, target audience, and expected conventions.
Content and Image Analysis
The content and images are analyzed to filter out irrelevant or low-quality elements. Key points, core ideas, and the core message are extracted in English. Important statistics, quotes, or data are noted, and existing organizational patterns are improved where needed.
Genre-Appropriate Rewriting
The content is rewritten according to the identified genre, applying appropriate structures and styles. For news articles, this means following the inverted pyramid structure and maintaining objectivity. For blog content, a more conversational tone is used, while technical/educational content requires clear, logical structure and professional language.
Refining the Content
The rewritten content is refined to ensure adherence to genre conventions while maintaining a natural flow. Idioms and expressions common in English are used appropriately, and examples are adapted to fit local culture. The content is checked for obvious AI writing patterns and proper attribution is ensured.
Final Validation
The final step involves validating the JSON output with JSON.parse() and ensuring all text has been translated to English. Line breaks use n, and the JSON format is verified with required fields. Images are reviewed again for direct relevance to the main article subject, and promotional images are removed. Image descriptions are translated to English, and Markdown formatting is applied naturally to enhance readability.