给定 a、b 两个文件，各存放 50 亿个 URL，每个 URL 各占 64 字节，内存限制是 4G，如何找出 a、b 文件共同的 URL？

Given two files a and b, each containing 5 billion URLs (each URL is 64 bytes), with a 4GB memory limit, how do you find the common URLs between a and b?

题目类型: 技术面试题

这是一道技术面试题，常见于澳洲IT公司面试中。

难度: hard

分类: 大数据处理, 海量数据, 文件处理

标签: hash partition, divide and conquer, external memory, set intersection

参考答案摘要

核心答案由于每个文件约为 320GB （远超 4GB 内存），无法整体载入内存，因此采用分治 + 哈希分桶来求交集。 Step 1：分桶拆分文件 a 遍历文件 a，对每个 URL 计算 hash(url) % 1000 ，按结果写入 1000 个小文件： a0, a1, ..., a999 ，每个约 300MB。 Step 2：分桶拆分文件 b 同样遍历文件 b，按 hash(url) % ...

答题技巧

技术面试题建议先理清思路再作答，从基础概念讲起，逐步深入。可以结合实际项目经验解释技术原理，展示你的理解深度和实践能力。

本题提供 STAR 原则详细解答和技术解析，登录匠人学院学习中心即可查看完整答案、收藏题目并进行模拟面试练习。

← 返回面试题库

给定 a、b 两个文件，各存放 50 亿个 URL，每个 URL 各占 64 字节，内存限制是 4G，如何找出 a、b 文件共同的 URL？

Hardalgorithms

想查看完整答案?

登录匠人学院学习中心，获取 STAR 格式回答和详细技术解析

前往学习中心查看答案

Follow Us

We Accept

Company

About Us Metaverse Classroom News & Blog JR Careers Become a Mentor Our Mentors Contact Us JR Store J3.Club

Resources

Job Referrals Events 1-on-1 Tutoring Industry Whitepapers Online Learning Interview Center Share Interview Experience Internship Membership

AI 工具

AI 工具箱考证匠 Cert Master 求职匠 Job Hunter 牛小匠 UniMate AI

University Resources

墨尔本大学昆士兰大学新南威尔士大学悉尼大学莫那什大学阿德莱德大学 RMIT QUT UTS

Immigration Services

Australia Immigration Skilled Visa 189/190/491 Employer Sponsored 482/186/494 Business Visa 188/888 UK Immigration US Immigration Canada Immigration

Enterprise

P3 Career Incubator Enterprise (EN)Corporate Training Internship Partnership Recruitment Partnership Apply for Partnership

Job Application Agent

Job Application Service Job Monitoring LinkedIn Management LinkedIn Networking Learn about P3

Support

FAQs Terms & Conditions Privacy Policy Cancellation & Refund Policy Site map

Top Categories

Web Full-Stack Bootcamp DevOps Bootcamp Data Engineering Bootcamp Data Analysis Bootcamp Coding for Beginners Business Analyst Internship Algorithm Bootcamp

Career Services

BA & PM Internship Data Science Internship Data Analysis Internship Marketing Internship Resume Review Interview Coaching VIP Mentor Guidance

Addresses

Level 10b, 144 Edward Street, Brisbane CBD(Headquarter)

Level 2, 171 La Trobe St, Melbourne VIC 3000

四川省成都市武侯区桂溪街道天府大道中段500号D5东方希望天祥广场B座45A13号

Business Hub, 155 Waymouth St, Adelaide SA 5000

Contact

hello@jiangren.com.au 0421-672-555

Disclaimer

JR Academy acknowledges Traditional Owners of Country throughout Australia and recognises the continuing connection to lands, waters and communities. We pay our respect to Aboriginal and Torres Strait Islander cultures; and to Elders past and present. Aboriginal and Torres Strait Islander peoples should be aware that this website may contain images or names of people who have since passed away.

All content on the JR Academy website, including course materials, logos, and information provided, is protected under Australian intellectual property laws. Unauthorized use, sale, distribution, reproduction, or modification is strictly prohibited. Violations may result in legal action. By accessing our website, you agree to respect our intellectual property. JR Academy Pty Ltd reserves all rights, including patents, trademarks, and copyrights. Any infringement will be subject to legal prosecution. View Terms of Service

© 2017-2026 JR Academy Pty Ltd. All rights reserved.

ABN 26621887572