有一个 1G 文件，每行一个词（不超过 16 字节），内存 1M，返回频数最高的 100 个词。

Given a 1GB file where each line is a word (<=16 bytes) and memory is limited to 1MB, how do you return the top 100 most frequent words?

题目类型: 技术面试题

这是一道技术面试题，常见于澳洲IT公司面试中。

难度: hard

分类: 大数据计算, TopK问题, 文件分治

标签: Hash分桶, 文件拆分, Trie, HashMap, 最小堆, 归并

参考答案摘要

核心答案思路： Hash 分桶 + 小文件统计 + TopK + 归并。 Step 1：分桶拆分顺序读文件，对每个词 x 计算 hash(x) % 5000 ，写入 5000 个小文件（ x0..x4999 ），使每个文件约 200KB。若某些文件仍超过 1MB，则继续按相同方式递归拆分，直到每个文件不超过 1MB。 Step 2：统计与局部 Top100 对每个小文件统计词频（可用 tri...

答题技巧

技术面试题建议先理清思路再作答，从基础概念讲起，逐步深入。可以结合实际项目经验解释技术原理，展示你的理解深度和实践能力。

本题提供 STAR 原则详细解答和技术解析，登录匠人学院学习中心即可查看完整答案、收藏题目并进行模拟面试练习。

← 返回面试题库

有一个 1G 文件，每行一个词（不超过 16 字节），内存 1M，返回频数最高的 100 个词。

Hardalgorithmsdata-structuressystem-design

想查看完整答案?

登录匠人学院学习中心，获取 STAR 格式回答和详细技术解析

前往学习中心查看答案

Follow Us

We Accept

Company

About Us Metaverse Classroom News & Blog JR Careers Become a Mentor Our Mentors Contact Us JR Store J3.Club

Resources

Job Referrals Events 1-on-1 Tutoring Industry Whitepapers Online Learning Interview Center Share Interview Experience Internship Membership

AI 工具

AI 工具箱考证匠 Cert Master 求职匠 Job Hunter 牛小匠 UniMate AI

University Resources

墨尔本大学昆士兰大学新南威尔士大学悉尼大学莫那什大学阿德莱德大学 RMIT QUT UTS

Immigration Services

Australia Immigration Skilled Visa 189/190/491 Employer Sponsored 482/186/494 Business Visa 188/888 UK Immigration US Immigration Canada Immigration

Enterprise

P3 Career Incubator Enterprise (EN)Corporate Training Internship Partnership Recruitment Partnership Apply for Partnership

Job Application Agent

Job Application Service Job Monitoring LinkedIn Management LinkedIn Networking Learn about P3

Support

FAQs Terms & Conditions Privacy Policy Cancellation & Refund Policy Site map

Top Categories

Web Full-Stack Bootcamp DevOps Bootcamp Data Engineering Bootcamp Data Analysis Bootcamp Coding for Beginners Business Analyst Internship Algorithm Bootcamp

Career Services

BA & PM Internship Data Science Internship Data Analysis Internship Marketing Internship Resume Review Interview Coaching VIP Mentor Guidance

Addresses

Level 10b, 144 Edward Street, Brisbane CBD(Headquarter)

Level 2, 171 La Trobe St, Melbourne VIC 3000

四川省成都市武侯区桂溪街道天府大道中段500号D5东方希望天祥广场B座45A13号

Business Hub, 155 Waymouth St, Adelaide SA 5000

Contact

hello@jiangren.com.au 0421-672-555

Disclaimer

JR Academy acknowledges Traditional Owners of Country throughout Australia and recognises the continuing connection to lands, waters and communities. We pay our respect to Aboriginal and Torres Strait Islander cultures; and to Elders past and present. Aboriginal and Torres Strait Islander peoples should be aware that this website may contain images or names of people who have since passed away.

All content on the JR Academy website, including course materials, logos, and information provided, is protected under Australian intellectual property laws. Unauthorized use, sale, distribution, reproduction, or modification is strictly prohibited. Violations may result in legal action. By accessing our website, you agree to respect our intellectual property. JR Academy Pty Ltd reserves all rights, including patents, trademarks, and copyrights. Any infringement will be subject to legal prosecution. View Terms of Service

© 2017-2026 JR Academy Pty Ltd. All rights reserved.

ABN 26621887572