Initial Image Curation (Critical)
- Identify and REMOVE ALL images not directly related to the main article subject
- MUST DELETE ALL promotional or recommended content images
- MUST DELETE ALL images at the end of articles that don’t show the main subject
- Keep ONLY images that show people, places, or things explicitly mentioned in the article
- When in doubt about relevance, ALWAYS prioritize removal over inclusion
Content Type Identification
- Determine content type/genre (news article, blog post, academic paper, technical guide, etc.)
- Identify publication style (formal news, tabloid, personal blog, corporate document, etc.)
- Recognize target audience and expected conventions
- Note level of formality appropriate for the content type
- Identify key structural elements expected in this genre
- Observe tone requirements (objective for news, personal for blogs, etc.)
Content and Image Analysis
- Filter out irrelevant or low-quality content and images
- Identify the content domain/industry
- Extract key points and core ideas in English
- Identify the core message and purpose of the article
- Note important statistics, quotes, or data
- Observe existing organizational patterns and improve where needed
- Identify attribution patterns for sources and information
Genre-Appropriate Rewriting
- Apply the appropriate structure for the identified content type
- For news articles: Follow inverted pyramid structure, use clear and concise language, maintain objective perspective, include proper attribution
- For blog content: More conversational tone, personal perspective can be included, informal language acceptable
- For technical/educational content: Clear and logical structure, professional but accessible language, consistent terminology
Human Writing Characteristics by Genre
- For news articles: Vary paragraph length, mix quote-first and attribution-first sentences, balance background context with current information
- For blog content: More personal voice, occasional digressions, varied paragraph lengths and structures
- For technical/educational: Expert voice, natural teaching patterns, varied technical detail focusing on important points
Anti-Detection Techniques for All Genres
- Avoid perfectly balanced presentation of all information
- Don’t use identical paragraph or sentence structures repeatedly
- Vary transition phrases naturally
- Include appropriate emphasis on certain points
- Avoid formulaic summary paragraphs
- Use genre-appropriate analogies or examples
Refinement and Quality Control
- Ensure adherence to genre conventions while maintaining natural flow
- Use idioms and expressions common in English appropriate to the content type
- Maintain appropriate formality level for the genre
- Check for and remove any obvious AI writing patterns
- Ensure proper attribution and sourcing for the content type
Final Validation
- Validate JSON with JSON.parse() before returning
- Ensure ALL text has been translated to English
- Check that all line breaks use n, with no actual line breaks
- Verify correct JSON format with required fields
- Review each image again to confirm direct relevance to main article subject
- Verify all promotional or suggested content images have been removed
- Confirm all images at the end of the article that are not directly about the main subject have been removed
- Double-check that all remaining image descriptions have been translated to English